Proceedings of the XXth Conference of Open Innovations Association FRUCT (May 2021)

Towards Automatic Modelling of Thematic Domains of a National Literature: Technical Issues in the Case of Russian

  • Tatiana Sherstinova,
  • Anna Moskvina,
  • Margarita A. Kirina

DOI
https://doi.org/10.23919/FRUCT52173.2021.9435451
Journal volume & issue
Vol. 29, no. 1
pp. 313 – 323

Abstract

Read online

A significant part of modern technologies associated with the development of artificial intelligence systems and digital analytics of diverse data relies on methods of computer text processing (NLP, speech technologies). However, NLP methods are applied primarily to specialized texts, such as scientific literature, technical documentation, news, etc., or social media discourse; fiction texts being usually left out of the focus of NLP practitioners as the fictional world seems to be of less significance or less information value from a practical point of view. Moreover, due to the poetic and metaphorical nature of literary texts, the use of some NLP methods (e.g., topic modeling) for fiction analysis turned out to be more complicated. At the same time, the influence of literature both on the consciousness of individuals and on the formation of social values can hardly be overestimated. Besides, making computers understand fiction in a similar way as humans do would be a real challenge for artificial intelligence. The article puts forward the idea of modeling thematic areas of literature on a national scale, which should reveal the main thematic domains of national literature as a whole. It will allow a better understanding of the cultural traits of the national consciousness in a given historical period and contribute to either literary studies and practical tasks. Methodological approaches to determining and modeling themes of literary works are considered, technical difficulties arising in the process are described, and the ways to solve them are suggested. The proposed methodology has been implemented in the design of the Russian short stories corpus (the first third of the 20th century) and can be applied in the development of artificial intelligence systems that process large volumes of literary texts in any language.

Keywords