Classification of domestic violence Persian textual content in social media based on topic modeling and ensemble learning

Meysam Salehi; Shahrbanoo Ghahari

Heliyon (Nov 2024)

Classification of domestic violence Persian textual content in social media based on topic modeling and ensemble learning

Meysam Salehi,
Shahrbanoo Ghahari

Affiliations

Meysam Salehi: Department of Mental Health, School of Behavioral Sciences and Mental Health (Tehran institute of psychiatry), Iran University of Medical Sciences, Tehran, Iran
Shahrbanoo Ghahari: Corresponding author.; Department of Mental Health, School of Behavioral Sciences and Mental Health (Tehran institute of psychiatry), Iran University of Medical Sciences, Tehran, Iran

Journal volume & issue: Vol. 10, no. 22
p. e39953

Abstract

Read online

Objective: Due to the importance of monitoring social networks to categorize domestic violence content and extract practical knowledge for conducting preventive interventions, as well as analyzing the extensive Persian textual content related to domestic violence generated in social networks following the COVID-19 pandemic, primarily, this research aims to create the best domestic violence Persian textual content classification model using topic modeling content at first and then combining algorithms using ensemble learning to achieve the best model performance. Method: By collecting Persian textual data using hashtags related to domestic violence equally and randomly from Telegram, Twitter, and Instagram networks between April 2020 and April 2023, the content were considered for topic modeling using the LDA algorithm. By extracting the probabilities of each topic for each document in our dataset, we considered the topic that had the highest probability to be a label for that document. Following feature extraction from labeled datasets, the Stacking and Voting ensemble learning methods were applied. Result: The analysis of 337,287 textual data revealed five topics: family crime news, war violence, women's rights, and violent reactions. Also, compared to the voting method, the stacking method performed better with 96.4577 precision, 96.4499 accuracy, 96.4499 recall, and 96.4475 F-score. Conclusion: According to the study findings, practical knowledge of the extracted topics can assist mental health centers in making preventive decisions. Moreover, the built model has the most efficient performance among the built models for the multi-class classification of DV texts in the Persian language for social media monitoring.

Published in Heliyon

ISSN: 2405-8440 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General); Social Sciences: Social sciences (General)
Website: https://www.cell.com/heliyon/home

About the journal

Abstract

Keywords