CommIT Journal (Jun 2022)
An Explainable AI Model for Hate Speech Detection on Indonesian Twitter
Abstract
To avoid citizen disputes, hate speech on social media, such as Twitter, must be automatically detected. The current research in Indonesian Twitter focuses on developing better hate speech detection models. However, there is limited study on the explainability aspects of hate speech detection. The research aims to explain issues that previous researchers have not detailed and attempt to answer the shortcomings of previous researchers. There are 13,169 tweets in the dataset with labels like “hate speech” and “abusive language”. The dataset also provides binary labels on whether hate speech is directed to individual, group, religion, race, physical disability, and gender. In the research, classification is performed by using traditional machine learning models, and the predictions are evaluated using an Explainable AI model, such as Local Interpretable Model-Agnostic Explanations (LIME), to allow users to comprehend why a tweet is regarded as a hateful message. Moreover, models that perform well in classification perceive incorrect words as contributing to hate speech. As a result, such models are unsuitable for deployment in the real world. In the investigation, the combination of XGBoost and logical LIME explanations produces the most logical results. The use of the Explainable AI model highlights the importance of choosing the ideal model while maintaining users’ trust in the deployed model.
Keywords