IEEE Access (Jan 2021)

Advances in Machine Learning Algorithms for Hate Speech Detection in Social Media: A Review

  • Nanlir Sallau Mullah,
  • Wan Mohd Nazmee Wan Zainon

DOI
https://doi.org/10.1109/ACCESS.2021.3089515
Journal volume & issue
Vol. 9
pp. 88364 – 88376

Abstract

Read online

The aim of this paper is to review machine learning (ML) algorithms and techniques for hate speech detection in social media (SM). Hate speech problem is normally model as a text classification task. In this study, we examined the basic baseline components of hate speech classification using ML algorithms. There are five basic baseline components – data collection and exploration, feature extraction, dimensionality reduction, classifier selection and training, and model evaluation, were reviewed. There have been improvements in ML algorithms that were employed for hate speech detection over time. New datasets and different performance metrics have been proposed in the literature. To keep the researchers informed regarding these trends in the automatic detection of hate speech, it calls for a comprehensive and an updated state-of-the-art. The contributions of this study are three-fold. First to equip the readers with the necessary information on the critical steps involved in hate speech detection using ML algorithms. Secondly, the weaknesses and strengths of each method is critically evaluated to guide researchers in the algorithm choice dilemma. Lastly, some research gaps and open challenges were identified. The different variants of ML techniques were reviewed which include classical ML, ensemble approach and deep learning methods. Researchers and professionals alike will benefit immensely from this study.

Keywords