IEEE Access (Jan 2024)
From Text to Insight: An Integrated CNN-BiLSTM-GRU Model for Arabic Cyberbullying Detection
Abstract
Several research on cyberbullying detection have employed different deep learning and machine learning methodologies to achieve promising outcomes. Nevertheless, most of them have primarily concentrated on using English data for both purposes: training and testing, with only a limited number considering native languages such as Arabic. Thus, there is a critical need to address cyberbullying in its native linguistic context. The dataset utilized in this research has been compiled and sourced from various Kaggle and Github repositories. Six collected benchmark datasets from Facebook, Twitter and Instagram in addition to a developed Arabic cyberbullying lexicon were utilized to evaluate the efficiency of the proposed hybrid model. Prior to classification, data cleaning was carried out to preprocess the text. Moreover, word embedding as a natural language processing method is utilized. Numerous machine learning and deep learning algorithms were assessed, encompassing naïve bayes, support vector machines, k-nearest neighbors, decision trees, random forest, multi-layer perceptron neural networks, convolutional neural networks, recurrent neural networks, bidirectional long short-term memory, long short-term memory, and gated recurrent units, with a meticulous comparative analysis conducted. Given their demonstrated potential, hybrid techniques have emerged as promising model for effectively detecting instances of cyberbullying. Thus, the best performing algorithms is utilized to construct the hybrid model. This research introduces a hybrid deep learning model with stacked word embedding. This model consistently outperforms single models in terms of cyberbullying detection. We extensively investigated the performance of the proposed hybrid model across diverse data contexts. Through thorough study and validation, the proposed hybrid model demonstrates enhanced capabilities in feature extraction and accurate text classification.
Keywords