IEEE Access (Jan 2023)
A Deep Learning Framework for the Detection of Malay Hate Speech
Abstract
Although social media can efficiently disseminate information, they also facilitate the dissemination of online abuse, harassment, and hate speech. In 2019, United Nations Secretary-General introduced the United Nations Strategy and Plan of Action on Hate Speech in response to the alarming global trend of rising hate speech. It is crucial to prevent hate speech because it can have severe negative effects on both individuals and society. While much research has been conducted on detecting online hate speech in English, little research has been conducted in other languages, such as Malay. In this paper, we present the first benchmark dataset HateM for detecting hate speech in Malay, comprised of over 4,892 annotated tweets. We created a two-channel deep learning model, XLCaps, to effectively manage noisy Malay language posts. One channel’s input is the XLNet language model followed by the capsule network, while the other channel’s input is the FastText embedding with Bi-GRU. Our proposed model surpasses the baseline models in terms of overall accuracy and F1 measurement, which are 80.69% and 80.41%, respectively. This work contributes to the prevention of hate speech in Malay and can serve as a basis for future study in this area. The approach to effectively managing noisy Malay posts can be also applied to other languages. The code and dataset are available at https://github.com/MaityKrishanu/Hate_Malay.
Keywords