Addressing cyberbullying in Urdu tweets: a comprehensive dataset and detection system

Farah Adeeba; Muhammad Irfan Yousuf; Izza Anwer; Sardar Umair Tariq; Abdullah Ashfaq; Malik Naqeeb

doi:10.7717/peerj-cs.1963

PeerJ Computer Science (Apr 2024)

Addressing cyberbullying in Urdu tweets: a comprehensive dataset and detection system

Farah Adeeba,
Muhammad Irfan Yousuf,
Izza Anwer,
Sardar Umair Tariq,
Abdullah Ashfaq,
Malik Naqeeb

Affiliations

Farah Adeeba: Department of Computer Science, University of Engineering and Technology Lahore, Lahore, Punjab, Pakistan
Muhammad Irfan Yousuf: Department of Computer Science, University of Engineering and Technology Lahore, Lahore, Punjab, Pakistan
Izza Anwer: Department of Transportation Engineering and Management, University of Engineering and Technology Lahore, Lahore, Punjab, Pakistan
Sardar Umair Tariq: Department of Computer Science, University of Engineering and Technology Lahore, Lahore, Punjab, Pakistan
Abdullah Ashfaq: Department of Computer Science, University of Engineering and Technology Lahore, Lahore, Punjab, Pakistan
Malik Naqeeb: Department of Computer Science, University of Engineering and Technology Lahore, Lahore, Punjab, Pakistan

DOI: https://doi.org/10.7717/peerj-cs.1963
Journal volume & issue: Vol. 10
p. e1963

Abstract

Read online Read online

The prevalence of cyberbullying has reached an alarming rate, affecting approximately 54% of teenagers who experience various forms of cyberbullying, including offensive hate speech, threats, and racism. This research introduces a comprehensive dataset and system for cyberbullying detection in Urdu tweets, leveraging a spectrum of machine learning approaches including traditional models and advanced deep learning techniques. The objectives of this study are threefold. Firstly, a dataset consisting of 12,500 annotated tweets in Urdu is created, and it is made publicly available to the research community. Secondly, annotation guidelines for Urdu text with appropriate labels for cyberbullying detection are developed. Finally, a series of experiments is conducted to assess the performance of machine learning and deep learning techniques in detecting cyberbullying. The results indicate that fastText deep learning models outperform other models in cyberbullying detection. This study demonstrates its efficacy in effectively detecting and classifying cyberbullying incidents in Urdu tweets, contributing to the broader effort of creating a safer digital environment.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords