Journal of King Saud University: Computer and Information Sciences (Feb 2024)

A hybrid combination of CNN Attention with optimized random forest with grey wolf optimizer to discriminate between Arabic hateful, abusive tweets

  • Abeer Aljohani,
  • Nawaf Alharbe,
  • Rabia Emhamed Al Mamlook,
  • Mashael M. Khayyat

Journal volume & issue
Vol. 36, no. 2
p. 101961

Abstract

Read online

Arabic hateful speech recognition has long been a major area of focus in Natural Language Processing (NLP) research. In light of recent advancements in transformer models and deep learning, researchers are now turning to transfer learning techniques based on existing models such as BERT for Arabic hateful speech recognition. To detect Arabic hateful contexts, using advanced machine learning algorithms and NLP techniques is essential. These techniques can help to detect different forms of hateful contexts in Arabic by analyzing the text for lexical, semantic, and syntactic features. In this research, we proposed a new hybrid approach that combines deep and machine learning models to detect hateful and abusive content in Arabic. The proposed model consists of a combination of convolutional neural networks and attention layers that are trained to differentiate between normal, abusive, and hateful contexts in Arabic. In the first step, we used a pre-trained model to extract features from the hateful Arabic context. After that, we used an optimized random forest combined with particle swarm optimization and grey wolf optimizer to classify the extracted features. Finally, we evaluated the performance of the model to detect hateful Arabic contexts. To evaluate the proposed method we used 5846 and 6023 tweets with 3 categories of hateful, abusive, and normal Arabic contexts. The experimental result indicates 97.16% accuracy, 97.15% F1-score, 97.17% precision, and 97.13% sensitivity using CNN Attention + optimized random forest by the grey wolf optimizer on 5846 tweets. 97.83% accuracy, 97.83% F1-score, 97.84% precision, and 97.83% sensitivity have been reported CNN Attention + optimized random forest by the grey wolf optimizer on 6023 tweets.

Keywords