IEEE Access (Jan 2024)
Fine-Grained Multilingual Hate Speech Detection Using Explainable AI and Transformers
Abstract
The detection of hate speech on online platforms is essential for maintaining safe and inclusive digital environments. Although significant progress has been made in binary classification for hate speech detection, challenges persist in multilingual and fine-grained classification. This study presents a comprehensive model for hate speech detection across English, Urdu, and Sindhi, utilizing advanced deep learning models like Bidirectional Encoder Representations from Transformers (BERT) and its multilingual variants. Additionally, the research employs Explainable Artificial Intelligence (XAI) techniques, such as Local Interpretable Model-Agnostic Explanations (LIME), to gain insights into model performance. This work curated a multilingual hate speech detection dataset and a robust fine-grained hate speech detection model. The dataset includes non-hate and hate speech classes. Furthermore, the hate speech class is categorized into five fine-grained categories, including Disability, Gender, Nationality, Race, and Religion. The experimental findings of this study showed 91% F-score in binary class classification and 86% weighted F-score in fine-grained hate speech detection for multilingual datasets using XLM-RoBERTa technique. Notably, the Religion class achieved the highest F-score of 92%. It is believed that this study contributes to reducing the spread of hate speech (written in Either Urdu, English, or Sindhi) on various social media platforms.
Keywords