FMC: Multimodal fake news detection based on multi-granularity feature fusion and contrastive learning

Facheng Yan; Mingshu Zhang; Bin Wei; Kelan Ren; Wen Jiang

doi:10.1016/j.aej.2024.08.103

Alexandria Engineering Journal (Dec 2024)

FMC: Multimodal fake news detection based on multi-granularity feature fusion and contrastive learning

Facheng Yan,
Mingshu Zhang,
Bin Wei,
Kelan Ren,
Wen Jiang

Affiliations

Facheng Yan: College of Cryptography Engineering, Engineering University of PAP, Xi’an, ShaanXi, 710086, China
Mingshu Zhang: Corresponding author.; College of Cryptography Engineering, Engineering University of PAP, Xi’an, ShaanXi, 710086, China
Bin Wei: College of Cryptography Engineering, Engineering University of PAP, Xi’an, ShaanXi, 710086, China
Kelan Ren: College of Cryptography Engineering, Engineering University of PAP, Xi’an, ShaanXi, 710086, China
Wen Jiang: College of Cryptography Engineering, Engineering University of PAP, Xi’an, ShaanXi, 710086, China

DOI: https://doi.org/10.1016/j.aej.2024.08.103
Journal volume & issue: Vol. 109
pp. 376 – 393

Abstract

Read online

The automatic detection of multimodal fake news has recently garnered significant attention. However, the existing detection methods mainly focus on merging textual and visual features, but fail to make full use of multimodal numbers from the perspective of multi-granularity. Additionally, emerging pre-trained multimodal learning models and powerful contrastive learning methodologies remain underutilized in this domain. To address these challenges, we introduce a multimodal fake news detection framework (FMC) that integrates multi-granularity feature fusion with contrastive learning. Initially, FMC utilizes a range of pre-trained models to extract textual and visual features at various levels of granularity. Subsequently, a multi-granularity fused multimodal news representation is created through cross-modal alignment and textual–visual co-attention . The final step involves classifying the news as true or fake, leveraging a combination of multi-head self-attention mechanisms and a contrastive learning auxiliary task. This contrastive learning auxiliary task specifically aims to minimize the distance between similar news representations that share the same label in the multimodal feature space, and maximize the distance between representations of differing labels. Comprehensive experiments conducted on three real-world datasets have demonstrated the superior effectiveness of our proposed framework, significantly surpassing the performance of current state-of-the-art methods.

Published in Alexandria Engineering Journal

ISSN: 1110-0168 (Print); 2090-2670 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: https://www.sciencedirect.com/journal/alexandria-engineering-journal

About the journal

Abstract

Keywords