Lithuanian Journal of Statistics (Dec 2022)

Discriminating poetry and prose using syllable statistics

  • Gediminas Murauskas,
  • Marijus Radavičius

DOI
https://doi.org/10.15388/LJS.2022.31988
Journal volume & issue
Vol. 61

Abstract

Read online

The aim of the paper is to construct a universal classifier to discriminate short Lithuanian text excerpts of poetry from that of prose. Here the universality means that the classifier is relatively insensitive to a text content and author's style. Since syllables represent phonetic properties and are less sensitive to text content as compared to words, the classifier training is based on frequencies of syllables in texts to be classified. The text data is taken from digitized library http://ebiblioteka.mkp.emokykla.lt. The error rate of the trained classifier applied to testing excerpts of 100 words is less than 5\%.

Keywords