Comparative Analysis of Word Embeddings for Multiclass Cyberbullying Detection

Azhi Faraj; Semih Utku

doi:10.21928/uhdjst.v8n1y2024.pp55-63

UHD Journal of Science and Technology (Feb 2024)

Comparative Analysis of Word Embeddings for Multiclass Cyberbullying Detection

Azhi Faraj,
Semih Utku

Affiliations

Azhi Faraj: Department of Computer Engineering, Faculty of Engineering, Dokuz Eylul University, Izmir, Turkey, Department of Information Technology, College of Commerce, Sulaimani University, Sulaymaniyah, Iraq
Semih Utku: Department of Computer Engineering, Faculty of Engineering, Dokuz Eylul University, Izmir, Turkey

DOI: https://doi.org/10.21928/uhdjst.v8n1y2024.pp55-63
Journal volume & issue: Vol. 8, no. 1
pp. 55 – 63

Abstract

Read online

Cyberbullying has emerged as a pervasive concern in modern society, particularly within social media platforms. This phenomenon encompasses employing digital communication to instill fear, threaten, harass, or harm individuals. Given the prevalence of social media in our lives, there is an escalating need for effective methods to detect and combat cyberbullying. This paper aims to explore the utilization of word embeddings and to discern the comparative effectiveness of trainable word embeddings, pre-trained word embeddings, and fine-tuned language models in multiclass cyberbullying detection. Distinguishing from previous binary classification methods, our research delves into nuanced multiclass detection. The exploration of word embeddings holds significant promise due to its ability to transform words into dense numerical vectors within a high-dimensional space. This transformation captures intricate semantic and syntactic relationships inherent in language, enabling machine learning (ML) algorithms to discern patterns that might signify cyberbullying. In contrast to previous research, this work delves beyond primary binary classification and centers on the nuanced realm of multiclass cyberbullying detection. The research employs diverse techniques, including convolutional neural networks and bidirectional long short-term memory, alongside well-known pre-trained models such as word2vec and bidirectional encoder representations from transformers (BERT). Moreover, traditional ML algorithms such as K-nearest neighbors, Random Forest, and Naïve Bayes are integrated to evaluate their performance vis-à-vis deep learning models. The findings underscore the promise of a fine-tuned BERT model on our dataset, yielding the most promising results in multiclass cyberbullying detection, and achieving the best-recorded accuracy of 85% on the dataset.

Published in UHD Journal of Science and Technology

ISSN: 2521-4209 (Print); 2521-4217 (Online)
Publisher: University of Human Development
Country of publisher: Iraq
LCC subjects: Science
Website: http://journals.uhd.edu.iq/index.php/uhdjst/index

About the journal

Abstract

Keywords