AdaFN-AG: Enhancing multimodal interaction with Adaptive Feature Normalization for multimodal sentiment analysis

Weilong Liu; Hua Xu; Yu Hua; Yunxian Chi; Kai Gao

Intelligent Systems with Applications (Sep 2024)

AdaFN-AG: Enhancing multimodal interaction with Adaptive Feature Normalization for multimodal sentiment analysis

Weilong Liu,
Hua Xu,
Yu Hua,
Yunxian Chi,
Kai Gao

Affiliations

Weilong Liu: School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, 050018, China; Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Hua Xu: Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Yu Hua: School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, 050018, China
Yunxian Chi: School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, 050018, China
Kai Gao: School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, 050018, China; Corresponding author.

Journal volume & issue: Vol. 23
p. 200410

Abstract

Read online

In multimodal sentiment analysis, achieving effective fusion among text, acoustic, and visual modalities for enhanced sentiment prediction is a crucial research topic. Recent studies typically employ tensor-based or attention-based mechanisms for multimodal fusion. However, the former fails to achieve satisfactory prediction performance, and the latter complicates the computation of fusion between non-textual modalities. Therefore, this paper proposes the multimodal sentiment analysis model based on Adaptive Feature Normalization and Attention Gating mechanism (AdaFN-AG). Firstly, facing highly synchronized non-textual modalities, we design the Adaptive Feature Normalization (AdaFN) method, which focuses more on sentiment features interaction rather than timing features association. In AdaFN, acoustic and visual modality features achieve cross-modal interaction through normalization, inverse normalization, and mix-up operations, with weights utilized for adaptive strength regulation of the cross-modal interaction. Meanwhile, we design the Attention Gating mechanism that facilitates cross-modal interactions between textual and non-textual modalities through cross-attention and captures timing associations, while the gating module concurrently regulates the intensity of these interactions. Additionally, we employ self-attention to capture the intrinsic correlations within single-modal features. Subsequently, we conduct experiments on three benchmark datasets for multimodal sentiment analysis, with the results indicating that AdaFN-AG outperforms the baselines across the majority of evaluation metrics. Through research and experiments, we validate that AdaFN-AG not only enhances performance by adopting appropriate methods for different types of cross-modal interactions while conserving computational resources but also verifies the generalization capability of the AdaFN method.

Published in Intelligent Systems with Applications

ISSN: 2667-3053 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/intelligent-systems-with-applications

About the journal

Abstract

Keywords