A Deep Learning Framework for the Detection of Malay Hate Speech

Krishanu Maity; Shaubhik Bhattacharya; Sriparna Saha; Manjeevan Seera

doi:10.1109/ACCESS.2023.3298808

IEEE Access (Jan 2023)

A Deep Learning Framework for the Detection of Malay Hate Speech

Krishanu Maity,
Shaubhik Bhattacharya,
Sriparna Saha,
Manjeevan Seera

Affiliations

Krishanu Maity: ORCiD; CSE Department, Indian Institute of Technology Patna, Bihta, India
Shaubhik Bhattacharya: CSE Department, Indian Institute of Technology Patna, Bihta, India
Sriparna Saha: ORCiD; CSE Department, Indian Institute of Technology Patna, Bihta, India
Manjeevan Seera: ORCiD; Department of Econometrics and Business Statistics, School of Business, Monash University Malaysia, Subang Jaya, Selangor Darul Ehsan, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2023.3298808
Journal volume & issue: Vol. 11
pp. 79542 – 79552

Abstract

Read online

Although social media can efficiently disseminate information, they also facilitate the dissemination of online abuse, harassment, and hate speech. In 2019, United Nations Secretary-General introduced the United Nations Strategy and Plan of Action on Hate Speech in response to the alarming global trend of rising hate speech. It is crucial to prevent hate speech because it can have severe negative effects on both individuals and society. While much research has been conducted on detecting online hate speech in English, little research has been conducted in other languages, such as Malay. In this paper, we present the first benchmark dataset HateM for detecting hate speech in Malay, comprised of over 4,892 annotated tweets. We created a two-channel deep learning model, XLCaps, to effectively manage noisy Malay language posts. One channel’s input is the XLNet language model followed by the capsule network, while the other channel’s input is the FastText embedding with Bi-GRU. Our proposed model surpasses the baseline models in terms of overall accuracy and F1 measurement, which are 80.69% and 80.41%, respectively. This work contributes to the prevention of hate speech in Malay and can serve as a basis for future study in this area. The approach to effectively managing noisy Malay posts can be also applied to other languages. The code and dataset are available at https://github.com/MaityKrishanu/Hate_Malay.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords