Caracteres: Estudios Culturales y Críticos de la Esfera Digital (Nov 2016)
El análisis estilométrico aplicado a la literatura española: las novelas policiacas e históricas
Abstract
This paper demonstrates that a computer can determine the authorship of a text. To this end we created a corpus of 122 contemporary novels written in Spanish (69 historical novels, 50 crime novels, and 3 westerns). The corpus was then studied using stylo, a stylometric analysis package written in the programming language R. We chose to apply the simplest of the multiple types of analysis offered by this package: cluster analysis. The results are very interesting: by taking into account just the 100 most frequently used words (MFW), the computer was able to group the different works of each author as well as assigning those published under a pseudonym to the true author without incurring in any errors.