The Use of a Large Language Model for Cyberbullying Detection

Bayode Ogunleye; Babitha Dharmaraj

doi:10.3390/analytics2030038

Analytics (Sep 2023)

The Use of a Large Language Model for Cyberbullying Detection

Bayode Ogunleye,
Babitha Dharmaraj

Affiliations

Bayode Ogunleye: Department of Computing & Mathematics, University of Brighton, Brighton BN2 4GJ, UK
Babitha Dharmaraj: Department of Digital, Data and Technology, Ofgem, London E14 4PU, UK

DOI: https://doi.org/10.3390/analytics2030038
Journal volume & issue: Vol. 2, no. 3
pp. 694 – 707

Abstract

Read online

The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today’s cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance and generalisation issues. In recent years, large language models (LLMs) like BERT and RoBERTa have achieved state-of-the-art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively for CB detection. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed that RoBERTa outperformed other models.

Published in Analytics

ISSN: 2813-2203 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Science: Mathematics: Probabilities. Mathematical statistics
Website: https://www.mdpi.com/journal/analytics

About the journal

Abstract

Keywords