Модернизация, инновация, развитие (Oct 2024)

Methodology for extracting narratives from social media big data

  • E. Yu. Petrov,
  • A. Yu. Sarkisova,
  • D. O. Dunaeva,
  • A. S. Voronov,
  • M. G. Myagkov

DOI
https://doi.org/10.18184/2079-4665.2024.15.3.404-420
Journal volume & issue
Vol. 15, no. 3
pp. 404 – 420

Abstract

Read online

Purpose: of the article is to present the experience in developing and testing the methodology for extracting a system of narratives on a socially significant phenomenon from authentic social network big data (using the example of narratives about COVID-19 vaccination in the Russian social network VKontakte during the pandemic).Methods: of automated data analysis were used by the tools of the PolyAnalyst analytical platform: topic modeling (PLSA method), text indexing algorithms with the sentence identification stage, clustering, data aggregation, data normalization, calculation of a quantitative index. The calculation of the measure of proximity of keywords using the Python, partial manual markup and data validation were also carried out.Results: 4.5 million messages relevant to the topic of COVID-19 vaccination published in VKontakte from 01.01.2020 to 01.03.2023 were reduced to 237 stable narratives. A popularity index was calculated for each narrative. For example, the following narrative turned out to be the most popular: “Employers put pressure on people to get vaccinated” (it was supported by 76,118 texts). As a result of the study, a dataset was obtained, including 237 narratives.Conclusions and Relevance: the developed toolkit is universal: the methodology can be adapted to any relevant topic, requiring only adjustments to the input parameters of thematic modeling. The obtained dataset is planned to be introduced into scientific circulation as an up-to-date material for studying public opinion on vaccination in Russia. The results contribute to international research on public opinion and communication in crises and can serve as a basis for practical actions aimed at improving the quality of public communications and decision-making at all levels of government.

Keywords