Applied Sciences (May 2024)

GreenRu: A Russian Dataset for Detecting Mentions of Green Practices in Social Media Posts

  • Olga Zakharova,
  • Anna Glazkova

DOI
https://doi.org/10.3390/app14114466
Journal volume & issue
Vol. 14, no. 11
p. 4466

Abstract

Read online

Green practices are social practices that aim to harmonize the relations between people and the natural environment. They may involve minimizing the use of resources and the generation of waste and emissions. Detecting green practices in social media posts helps to understand which green practices are currently common and to develop recommendations on the scaling of green practices to reduce environmental problems. This paper describes GreenRu, a novel Russian social media dataset for detecting the mentions of green practices related to waste management. It has a sentence-level markup and consists of 1326 posts collected in Russian online communities. The total number of mentions of green waste practices is 3765. The paper assessed the effectiveness of the multi-label and one-versus-rest BERT-based models for detecting the mentions of green practices in social media posts and compared several data augmentation methods in terms of both classification metrics and human evaluation. To augment the dataset, a backtranslation method and generative language models, such as RuGPT, RuT5, and ChatGPT, were used in this study. The results enable researchers to monitor the green waste practices on social networks and develop environmental policies. Additionally, GreenRu can support machine learning models to analyze social media content, assess the prevalence and effectiveness of green waste practices, and identify ways to expand them.

Keywords