CAMM: Cross-Attention Multimodal Classification of Disaster-Related Tweets

Anuradha Khattar; S. M. K. Quadri

doi:10.1109/ACCESS.2022.3202976

IEEE Access (Jan 2022)

CAMM: Cross-Attention Multimodal Classification of Disaster-Related Tweets

Anuradha Khattar,
S. M. K. Quadri

Affiliations

Anuradha Khattar: ORCiD; Department of Computer Science, Jamia Millia Islamia, New Delhi, India
S. M. K. Quadri: ORCiD; Department of Computer Science, Jamia Millia Islamia, New Delhi, India

DOI: https://doi.org/10.1109/ACCESS.2022.3202976
Journal volume & issue: Vol. 10
pp. 92889 – 92902

Abstract

Read online

During the past decade, social media platforms have been extensively used for information dissemination by the affected community and humanitarian agencies during a disaster. Although many studies have been done recently to classify the informative and non-informative messages from social media posts, most are unimodal, i.e., have independently used textual or visual data to build deep learning models. In the present study, we integrate the complementary information provided by the text and image messages about the same event posted by the affected community on the social media platform Twitter and build a multimodal deep learning model based on the concept of the attention mechanism. The attention mechanism is a recent breakthrough that has revolutionized the field of deep learning. Just as humans pay more attention to a specific part of the text or image, ignoring the rest, neural networks can also be trained to concentrate on more relevant features through the attention mechanism. We propose a novel Cross-Attention Multi-Modal (CAMM) deep neural network for classifying multimodal disaster data, which uses the attention mask of the textual modality to highlight the features of the visual modality. We compare CAMM with unimodal models and the most popular bilinear multimodal models, MUTAN and BLOCK, generally used for visual question answering. CAMM achieves an average F1-score of 84.08%, better than the MUTAN and BLOCK methods by 6.31% and 5.91%, respectively. The proposed cross-attention-based multimodal deep learning method outperforms the current state-of-the-art fusion methods on the benchmark multimodal disaster dataset by highlighting more relevant cross-domain features of text and image tweets.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords