Proceedings of the XXth Conference of Open Innovations Association FRUCT (May 2021)

Topic Modeling of Russian-Language Texts Using the Parts-of-Speech Composition of Topics (on the Example of Volunteer Movement Semantics in Social Media)

  • Anna Maltseva,
  • Natalia Shilkina,
  • Olesia Makhnytkina,
  • Evgenii Evseev,
  • Mikhail Matveev

DOI
https://doi.org/10.23919/FRUCT52173.2021.9435475
Journal volume & issue
Vol. 29, no. 1
pp. 247 – 253

Abstract

Read online

The article presents a new approach to topic modeling of texts - this is topic modeling based on part-of-speech topics. We do not consider parts of the speech as a gnoseological concept that reflects the way in which language is formally classified. We believe that parts of speech are within the language competence of the person and are used in the process of communication, performing a certain function in the communication process. The essence of topic modeling is seen as the creation of semantic models of the text corpus. The goal is to study the speech representation of modern movements and communities. The hypothesis is that the forums of a social movement reflect its characteristics, the nature, and activities of this movement. Three groups of the Russian social media VKontakte were chosen as an empirical object: ""All for the Victory!"", ""Center of (City) Volunteers of St. Petersburg,"" ""Volunteers of St. Petersburg."" Topic modeling was carried out using the latent Dirichlet allocation (LDA) method, implemented in the Gensim package along with the Mallet implementation. Model quality validation was carried out using the coherence coefficient. The described approach to the analysis of web texts of volunteer semantics based on the part-of-speech composition of topics made it possible to identify signs that characterize group identity, emotionality, and joint activities of Russian volunteers.

Keywords