Journal of Open Humanities Data (Nov 2023)
Translation Alignment for Ancient Greek: Annotation Guidelines and Gold Standards
Abstract
This paper covers three datasets containing texts in Ancient Greek, manually aligned at word level against translations in English (Grc-Eng), Portuguese (Grc-Por) and Latin (Grc-Lat). The datasets were collected by two domain experts through annotation on the Ugarit Translation Alignment Editor (https://ugarit.ialigner.com/). The quality of each dataset was measured through Inter-Annotator-Agreement (IAA) above 80%. Each dataset contains the aligned pairs and an Annotation Style Guide, and serves as a Gold Standard for translation alignment of Ancient Greek, for the evaluation of automatic translation alignment models, and as high-quality training data. The Annotation Style Guide provides a starting point to approach the task of translation alignment for research and teaching. The data is stored on GitHub and Zenodo.
Keywords