RIDE (Sep 2017)

Corpus of Spanish Golden-Age Sonnets

  • José Calvo Tello

DOI
https://doi.org/10.18716/ride.a.6.4
Journal volume & issue
Vol. 6

Abstract

Read online

In this paper a TEI corpus with sonnets from the Spanish Golden-Age is reviewed. Some of the 52 authors represented in the collection are Cervantes, Lope, Quevedo, Tirso, Calderón or Góngora. In total, the corpus contains more than 5000 sonnets. The project is currently under development at the University of Alicante, Spain. One of the strongest aspects of this corpus is the metrical annotation of each verse. The researchers have already analysed the corpus using topic modelling, a suitable technique for the structure of the collection and the size of the texts. The weakest aspect of this collection is the metadata of the files: the majority of them are redundant and some important aspects (e.g. identifiers of texts, author, collection, source) are missing. The corpus is available as a GitHub repository, a good practice that facilitates cloning all the data, the track of changes and the preservation of the corpus.

Keywords