Journal of the Text Encoding Initiative (Nov 2024)
Keeping It Open: A TEI-based Publication Pipeline for Historical Documents
Abstract
Following the emergence of numerous projects to make use of historical archives, books, or other materials, as well as the exponentially growing needs for digital tools tailored for those tasks, the DAHN project (Dispositif de soutien à l’Archivistique et aux Humanités Numériques) developed a complete open-source pipeline made of tools and methods making it possible to present a digital scholarly edition of scanned handwritten material. Composed of six steps (digitization, segmentation, transcription, post-OCR processing, encoding, and publication) and centered on historical documents, and more particularly on ego documents, this pipeline has been built around TEI, which works as a pivot format, to ensure its robustness, sustainability, and reusability. Beyond encoding in TEI, we also chose tools compatible with it, such as eScriptorium for segmentation/transcription and TEI Publisher for the publication. To further help the people working with the pipeline, we also heavily documented the development of the pipeline, as well as its steps, to ease its reuse.
Keywords