From Text to Insight: An Integrated CNN-BiLSTM-GRU Model for Arabic Cyberbullying Detection

Eman-Yaser Daraghmi; Sajida Qadan; Yousef-Awwad Daraghmi; Rami Yousuf; Omar Cheikhrouhou; Mohammed Baz

doi:10.1109/ACCESS.2024.3431939

IEEE Access (Jan 2024)

From Text to Insight: An Integrated CNN-BiLSTM-GRU Model for Arabic Cyberbullying Detection

Eman-Yaser Daraghmi,
Sajida Qadan,
Yousef-Awwad Daraghmi,
Rami Yousuf,
Omar Cheikhrouhou,
Mohammed Baz

Affiliations

Eman-Yaser Daraghmi: ORCiD; Department of Computer Science, Palestine Technical University—Kadoorie, Tulkarm, Palestine
Sajida Qadan: Faculty of Graduate Studies, Palestine Technical University—Kadoorie, Tulkarm, Palestine
Yousef-Awwad Daraghmi: Department of Computer Systems Engineering, Palestine Technical University—Kadoorie, Tulkarm, Palestine
Rami Yousuf: Department of Computer Systems Engineering, Palestine Technical University—Kadoorie, Tulkarm, Palestine
Omar Cheikhrouhou: ORCiD; Higher Institute of Computer Science of Mahdia, University of Monastir, Mahdia, Tunisia
Mohammed Baz: ORCiD; Department of Computer Engineering, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2024.3431939
Journal volume & issue: Vol. 12
pp. 103504 – 103519

Abstract

Read online

Several research on cyberbullying detection have employed different deep learning and machine learning methodologies to achieve promising outcomes. Nevertheless, most of them have primarily concentrated on using English data for both purposes: training and testing, with only a limited number considering native languages such as Arabic. Thus, there is a critical need to address cyberbullying in its native linguistic context. The dataset utilized in this research has been compiled and sourced from various Kaggle and Github repositories. Six collected benchmark datasets from Facebook, Twitter and Instagram in addition to a developed Arabic cyberbullying lexicon were utilized to evaluate the efficiency of the proposed hybrid model. Prior to classification, data cleaning was carried out to preprocess the text. Moreover, word embedding as a natural language processing method is utilized. Numerous machine learning and deep learning algorithms were assessed, encompassing naïve bayes, support vector machines, k-nearest neighbors, decision trees, random forest, multi-layer perceptron neural networks, convolutional neural networks, recurrent neural networks, bidirectional long short-term memory, long short-term memory, and gated recurrent units, with a meticulous comparative analysis conducted. Given their demonstrated potential, hybrid techniques have emerged as promising model for effectively detecting instances of cyberbullying. Thus, the best performing algorithms is utilized to construct the hybrid model. This research introduces a hybrid deep learning model with stacked word embedding. This model consistently outperforms single models in terms of cyberbullying detection. We extensively investigated the performance of the proposed hybrid model across diverse data contexts. Through thorough study and validation, the proposed hybrid model demonstrates enhanced capabilities in feature extraction and accurate text classification.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords