From OCR to Content Interpretation: Towards a Scalable Workflow for Arabic Literature in the Digital Humanities

Authors

  • Maura Tarquini Università degli Studi di Sassari
  • Elisa Gugliotta Université Grenoble Alpes

DOI:

https://doi.org/10.58015/2036-2293/806

Keywords:

Digital Humanities, Arabic literature, OCR

Abstract

This article explores how Arabic literary texts can function as epistemic devices in Digital Humanities when computational methods are aligned with qualitative interpretation. After situating Arabic DH within global digitization efforts and pedagogical frameworks, we present an OCR-to-analysis workflow tailored to Arabic, comparing Tesseract and Qari-OCR. Finally, a qualitative reading of al-Aswānī’s Awrāq ʿIṣṣām ʿAbd al-ʿĀṭī highlights anger, hatred, and sadness as dominant emotions shaping a narrative of alienation and critique. The study demonstrates how digitization, analysis, and interpretation converge to enrich research and pedagogy within our project.

Author Biography

Maura Tarquini, Università degli Studi di Sassari

Maura Tarquini holds a research contract at the University of Sassari within the RĀBIṬA project, dedicated to the digitization of Arabic literature. Since 2017, she has been teaching Arabic language at the University of Cagliari. She obtained her PhD in Arabic Dialectology in 2015 at Sapienza University of Rome, with a dissertation on Tunisian Arabic.

Downloads

Published

09 Dec 2025

How to Cite

Tarquini, M., and E. Gugliotta. “From OCR to Content Interpretation: Towards a Scalable Workflow for Arabic Literature in the Digital Humanities”. Testo e Senso, vol. 1, no. 29, Dec. 2025, pp. 115-28, doi:10.58015/2036-2293/806.

Issue

Section

Digital Humanities