Multilingual Detection of Cyberbullying in Mixed Urdu, Roman Urdu, and English Social Media Conversations

Fakhra Razi; Naveed Ejaz

doi:10.1109/ACCESS.2024.3432908

IEEE Access (Jan 2024)

Multilingual Detection of Cyberbullying in Mixed Urdu, Roman Urdu, and English Social Media Conversations

Fakhra Razi,
Naveed Ejaz

Affiliations

Fakhra Razi: ORCiD; Department of Computing and Technology, Iqra University, Islamabad Campus, Islamabad, Pakistan
Naveed Ejaz: ORCiD; Department of Computing and Technology, Iqra University, Islamabad Campus, Islamabad, Pakistan

DOI: https://doi.org/10.1109/ACCESS.2024.3432908
Journal volume & issue: Vol. 12
pp. 105201 – 105210

Abstract

Read online

Automatic cyberbullying detection in social media is increasingly vital due to the integral role of social networks in people’s lives and the severe impact of cyberbullying. Cyberbullying involves intentional, repetitive, aggressive behaviour to harm others online. Among Urdu-speaking communities worldwide, it is common to use Urdu, Roman Urdu, and English in social media conversations. Existing research and detection methods overlook these linguistic dynamics and fail to address cyberbullying across these languages comprehensively. Additionally, there is no dataset in Urdu and Roman Urdu covering the repetition and intent to harm components of cyberbullying. This research addresses this gap by developing and annotating a comprehensive dataset capturing linguistic variations in cyberbullying instances across Urdu, Roman Urdu, and English, incorporating all aspects of cyberbullying. Besides proposing a dataset, a framework for detecting cyberbullying has been proposed. The framework classifies text messages as aggressive or non-aggressive and introduces novel quantitative measures for repetition and the level of intent to cause harm. The proposed framework classifies cyberbullying by applying thresholds to measures of aggression, repetition, and intent to harm, integrating all three aspects. Results show aggression detection using fine-tuned m-BERT and MuRIL, incorporating measures of repetition and intent to harm on the proposed dataset. Additionally, experiments are conducted to demonstrate the impact of repetition and intent to harm on cyberbullying classification. The best results on the dataset are achieved using fine-tuned MuRIL with a precision of 0.93, recall of 0.92, and an F-measure of 0.92 by incorporating quantitative measures of repetition and intent to harm.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords