Journal of the Text Encoding Initiative (Feb 2022)

Handwritten Text Recognition Best Practice in the Beta maṣāḥǝft workflow

  • Hizkiel Mitiku Alemayehu

DOI
https://doi.org/10.4000/jtei.4109

Abstract

Read online

This contribution describes the workflow used to transcribe Manuscripts from the Ethiopian and Eritrean Tradition. The goal of the workflow is to obtain a TEI file with an initial text transcription that profits from a wealth of machine-generated information collected through community-based contributions. The author sets the framework of interest of this effort to discuss available state-of-the-art options and the actual workflow implemented. It is argued that a workflow that prefers expert post-processing in the TEI instead of refinement of the preprocessing techniques is preferable for this specific use case. The publication of large quantities of text although, not 100% correct, when done in a collaboratively edited and open environment, can still be used and provide a user with information reusable for research.

Keywords