Anuario Lope de Vega: Texto, Literatura, Cultura (Jan 2024)
The Moderniſa Project: Orthographic Modernization of Spanish Golden Age Dramas with Language Models
Abstract
The increasing application of computational methods to the literature of the Spanish Golden Age has revealed the necessity of automating the modernization of its texts to facilitate seamless comparison and analysis. This study pioneers the employment of Natural Language Processing (NLP) techniques for the transformation of Spanish Golden Age texts (circa 1590-1680) into modern, normalized Spanish (RAE 2010). The research employs the transformer architecture to train and evaluate models using a corpus of Golden Age dramas. The models show promise in handling tricky typographical marks and context-sensitive words, but also struggle with proper nouns and orthographic variations. Evaluated using different metrics common in the specialized literature, the tool demonstrates potential as a valuable resource for historians, philologists, and digital humanists. Limitations include the specificity of the training corpus and observed inconsistencies in punctuation and spelling even in modernized texts. This research offers a novel, scalable solution to the manual modernization of Golden Age Spanish literature, enabling further computational studies in the field.
Keywords