RIDE (Dec 2022)

TEITOK, a visual solution for XML/TEI encoding: editing, annotating and hosting linguistic corpora

  • Pilar Arrabal Rodríguez

DOI
https://doi.org/10.18716/ride.a.15.5
Journal volume & issue
Vol. 15

Abstract

Read online

TEITOK is a web-based system designed to bring scholarly editing and computational linguistics together with the purpose of creating and hosting online language corpora. The system offers a visually attractive environment for digital editing based on the XML/TEI standard. TEITOK consists of automatic processes for carrying out many linguistic text processing tasks and functions. It boasts an intuitive interface via which researchers and corpus creators, who are not always computer literate, can manage corpus maintenance and error correction. The tokenization strategy in TEITOK permits the linking of the different levels of editing and annotation of each word in a single XML document for subsequent retrieval of information. This method provides a tool for editing, annotating, and exploiting corpora with a powerful search engine. TEITOK stands out for its high customisation and adaptability to a wide variety of corpora. In this article we analyse its utilities oriented mainly towards the creation of historical corpora, taking for this purpose the particular case of Oralia diacrónica del español (ODE).

Keywords