Information (Jun 2024)

Genre Classification of Books in Russian with Stylometric Features: A Case Study

  • Natalia Vanetik,
  • Margarita Tiamanova,
  • Genady Kogan,
  • Marina Litvak

DOI
https://doi.org/10.3390/info15060340
Journal volume & issue
Vol. 15, no. 6
p. 340

Abstract

Read online

Within the literary domain, genres function as fundamental organizing concepts that provide readers, publishers, and academics with a unified framework. Genres are discrete categories that are distinguished by common stylistic, thematic, and structural components. They facilitate the categorization process and improve our understanding of a wide range of literary expressions. In this paper, we introduce a new dataset for genre classification of Russian books, covering 11 literary genres. We also perform dataset evaluation for the tasks of binary and multi-class genre identification. Through extensive experimentation and analysis, we explore the effectiveness of different text representations, including stylometric features, in genre classification. Our findings clarify the challenges present in classifying Russian literature by genre, revealing insights into the performance of different models across various genres. Furthermore, we address several research questions regarding the difficulty of multi-class classification compared to binary classification, and the impact of stylometric features on classification accuracy.

Keywords