IEEE Access (Jan 2024)

Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset

  • Slamet Riyadi,
  • Annisa Divayu Andriyani,
  • Siti Noraini Sulaiman

DOI
https://doi.org/10.1109/ACCESS.2024.3487433
Journal volume & issue
Vol. 12
pp. 159660 – 159668

Abstract

Read online

Hate speech detection is crucial in curbing online toxicity and fostering a safer digital environment. Previous research has proposed the use of a hybrid CNN-RNN model for this purpose. This study aims to improve the performance of the hybrid CNN-RNN method by using a double-layer approach to address imbalanced datasets. The novelty lies in using double layers of hybrid CNN-RNN to enhance hate speech detection accuracy. This research also employed an oversampling technique alongside the double-layer model. The process included preprocessing, feature extraction, training tuning, testing, and performance evaluation. The results demonstrated that the double-layer hybrid CNN-RNN model achieved an accuracy of 0.827, a precision of 0.797, a recall of 0.759, and an F1 score of 0.883, with imbalanced data. Meanwhile, balanced data yielded a higher accuracy of 0.908, a precision of 0.943, a recall of 0.894, and an F1 score of 0.914. Moreover, the proposed model outperformed the hybrid CNN-RNN with an imbalanced dataset, generating an accuracy of 0.752, a precision of 0.797, a recall of 0.559, and an F1 score of 0.657. Dropout and early stopping techniques addressed overfitting in complex models and large datasets. This research has advanced hate speech detection methodologies by demonstrating the effectiveness of a double-layer hybrid CNN-RNN model, especially for imbalanced data. It underscores the importance of addressing imbalanced datasets for improved model accuracy. Future work could explore alternative data augmentation techniques or compare the proposed model with other architectures.

Keywords