Journal of Open Humanities Data (Nov 2024)
Era- and Genre-Specific Stop Word Lists for Low-Resource Computational Research: A Classical Latin 'Exemplum'
Abstract
In this data paper, we argue that computational researchers—particularly those working in low-resource contexts—should consult with linguistic specialists to create targeted stop lists developed with specific eras, genres, authors, or contexts in mind. We offer an exemplum of stop lists targeted at Augustan Latin poetry. Our open-access stop lists, available as standalone files alongside a command-line based Python script, can serve as a starting point for other eras or genres of Latin literature. More broadly, the transdisciplinary and collaborative process by which these stop lists were created is of significant benefit to low-resource computational linguistics research teams.
Keywords