Multimodal Hate Speech Detection in Memes Using Contrastive Language-Image Pre-Training

Greeshma Arya; Mohammad Kamrul Hasan; Ashish Bagwari; Nurhizam Safie; Shayla Islam; Fatima Rayan Awad Ahmed; Aaishani De; Muhammad Attique Khan; Taher M. Ghazal

doi:10.1109/ACCESS.2024.3361322

IEEE Access (Jan 2024)

Multimodal Hate Speech Detection in Memes Using Contrastive Language-Image Pre-Training

Greeshma Arya,
Mohammad Kamrul Hasan,
Ashish Bagwari,
Nurhizam Safie,
Shayla Islam,
Fatima Rayan Awad Ahmed,
Aaishani De,
Muhammad Attique Khan,
Taher M. Ghazal

Affiliations

Greeshma Arya: ORCiD; Department of Electronics and Communication Engineering, Indira Gandhi Delhi Technical University for Women, New Delhi, India
Mohammad Kamrul Hasan: ORCiD; Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia
Ashish Bagwari: ORCiD; Department of Electronics and Communication Engineering, Uttarakhand Technical University, Dehradun, India
Nurhizam Safie: Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia
Shayla Islam: ORCiD; Institute of Computer Science and Digital Innovation, UCSI University Malaysia, Kuala Lumpur, Malaysia
Fatima Rayan Awad Ahmed: Computer Science Department, College of Computer Engineering and Science, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
Aaishani De: ORCiD; Department of Electronics and Communication Engineering, Indira Gandhi Delhi Technical University for Women, New Delhi, India
Muhammad Attique Khan: ORCiD; Department of Computer Science, HITEC University, Taxila, Pakistan
Taher M. Ghazal: ORCiD; Computer Science Department, Centre for Cyber Physical Systems, Khalifa University, Abu Dhabi, United Arab Emirates

DOI: https://doi.org/10.1109/ACCESS.2024.3361322
Journal volume & issue: Vol. 12
pp. 22359 – 22375

Abstract

Read online

In contemporary society, the proliferation of online hateful messages has emerged as a pressing concern, inflicting deleterious consequences on both societal fabric and individual well-being. The automatic detection of such malevolent content online using models designed to recognize it, holds promise in mitigating its harmful impact. However, the advent of “Hateful Memes” poses fresh challenges to the detection paradigm, particularly within the realm of deep learning models. These memes, constituting of a textual element associated with an image are individually innocuous but their combination causes a detrimental effect. Consequently, entities responsible for disseminating information via web browsers are compelled to institute mechanisms that regulate and automatically filter out such injurious content. Effectively identifying hateful memes demands algorithms and models endowed with robust vision and language fusion capabilities, capable of reasoning across diverse modalities. This research introduces a novel approach by leveraging the multimodal Contrastive Language-Image Pre-Training (CLIP) model, fine-tuned through the incorporation of prompt engineering. This innovative methodology achieves a commendable accuracy of 87.42%. Comprehensive metrics such as loss, AUROC, and f1 score are also meticulously computed, corroborating the efficacy of the proposed strategy. Our findings suggest that this approach presents an efficient means to regulate the dissemination of hate speech in the form of viral meme content across social networking platforms, thereby contributing to a safer online environment.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords