Detection of hate: speech tweets based convolutional neural network and machine learning algorithms

Hameda A. Sennary; Ghada Abozaid; Ashraf Hemeida; Alexey Mikhaylov; Tamara Broderick

doi:10.1038/s41598-024-76632-2

Scientific Reports (Nov 2024)

Detection of hate: speech tweets based convolutional neural network and machine learning algorithms

Hameda A. Sennary,
Ghada Abozaid,
Ashraf Hemeida,
Alexey Mikhaylov,
Tamara Broderick

Affiliations

Hameda A. Sennary: Department of Mathematics, Faculty of Science, Aswan University
Ghada Abozaid: Electrical Engineering Department, Faculty of Energy Engineering, Aswan University
Ashraf Hemeida: Electrical Engineering Department, Faculty of Energy Engineering, Aswan University
Alexey Mikhaylov: Department of Financial Technologies, Financial University Under the Government of the Russian Federation
Tamara Broderick: Laboratory for Information and Decision Systems, Massachusetts Institute of Technology

DOI: https://doi.org/10.1038/s41598-024-76632-2
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 15

Abstract

Read online

Abstract There is no doubt that social media sites have provided many benefits to humanity, such as sharing information continuously and communicating with others easily. It also seems that social media sites have many advantages, but in addition to these advantages, there are disadvantages that we always strive to find a solution. One of these disadvantages is sharing hate speech. In our study, we’re discussing a way to solve this phenomenon by using Term Frequency-Inverse Document Frequency (TF-IDF) based approach to feature engineering on eleven classifiers for machine and deep learning that can automatically identify hate speech. Three different databases were used, the first of which “Hate speech offensive tweets by Davidson et al.”, the second called "Twitter hate speech" and finally we merged the second data with (Cyberbullying dataset (toxicity_parsed_dataset)". The classifiers involved are Logistic Regression (LR), Naive Bayes (NB), Multi-layer Perceptron (MLP), and Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), K-Means, Decision Tree (DT), Gradient Boosting classifier (GBC), and the Extra Trees (ET) in addition to the convolutional neural network (CNN). Maximum accuracy was attained, which exceeded 99%.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal