Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset

Slamet Riyadi; Annisa Divayu Andriyani; Siti Noraini Sulaiman

doi:10.1109/ACCESS.2024.3487433

IEEE Access (Jan 2024)

Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset

Slamet Riyadi,
Annisa Divayu Andriyani,
Siti Noraini Sulaiman

Affiliations

Slamet Riyadi: ORCiD; Department of Information Technology, Faculty of Engineering, Universitas Muhammadiyah Yogyakarta, Yogyakarta, Indonesia
Annisa Divayu Andriyani: ORCiD; Department of Information Technology, Faculty of Engineering, Universitas Muhammadiyah Yogyakarta, Yogyakarta, Indonesia
Siti Noraini Sulaiman: ORCiD; Center of Electrical Engineering, College of Engineering, Universiti Teknologi MARA Cawangan Pulau Pinang,, Permatang Pauh, Penang, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2024.3487433
Journal volume & issue: Vol. 12
pp. 159660 – 159668

Abstract

Read online

Hate speech detection is crucial in curbing online toxicity and fostering a safer digital environment. Previous research has proposed the use of a hybrid CNN-RNN model for this purpose. This study aims to improve the performance of the hybrid CNN-RNN method by using a double-layer approach to address imbalanced datasets. The novelty lies in using double layers of hybrid CNN-RNN to enhance hate speech detection accuracy. This research also employed an oversampling technique alongside the double-layer model. The process included preprocessing, feature extraction, training tuning, testing, and performance evaluation. The results demonstrated that the double-layer hybrid CNN-RNN model achieved an accuracy of 0.827, a precision of 0.797, a recall of 0.759, and an F1 score of 0.883, with imbalanced data. Meanwhile, balanced data yielded a higher accuracy of 0.908, a precision of 0.943, a recall of 0.894, and an F1 score of 0.914. Moreover, the proposed model outperformed the hybrid CNN-RNN with an imbalanced dataset, generating an accuracy of 0.752, a precision of 0.797, a recall of 0.559, and an F1 score of 0.657. Dropout and early stopping techniques addressed overfitting in complex models and large datasets. This research has advanced hate speech detection methodologies by demonstrating the effectiveness of a double-layer hybrid CNN-RNN model, especially for imbalanced data. It underscores the importance of addressing imbalanced datasets for improved model accuracy. Future work could explore alternative data augmentation techniques or compare the proposed model with other architectures.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords