Temporal Multimodal Sentiment Analysis with Composite Cross Modal Interaction Network

YANG Li, ZHONG Junhong, ZHANG Yun, SONG Xinyu

doi:10.3778/j.issn.1673-9418.2311004

Jisuanji kexue yu tansuo (May 2024)

Temporal Multimodal Sentiment Analysis with Composite Cross Modal Interaction Network

YANG Li, ZHONG Junhong, ZHANG Yun, SONG Xinyu

Affiliations

YANG Li, ZHONG Junhong, ZHANG Yun, SONG Xinyu: School of Computer Science and Software Engineering, Southwest Petroleum University, Chengdu 610500, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2311004
Journal volume & issue: Vol. 18, no. 5
pp. 1318 – 1327

Abstract

Read online

To address the issues of insufficient modal fusion and weak interactivity caused by semantic feature differences between different modalities in multimodal emotion analysis, a temporal multimodal sentiment analysis model for composite cross modal interaction network (CCIN-SA) is constructed by studying and analyzing the potential correlations between different modalities. The model first uses a bidirectional gated loop unit and a multi-head attention mechanism to extract temporal features of text, visual, and speech modalities with contextual semantic information. Then, a cross modal attention interaction layer is designed to continuously strengthen the target mode using low order signals from auxiliary modes, enabling the target mode to learn information from auxiliary modes and capture potential adaptability between modes. Then it inputs the enhanced features into the composite feature fusion layer, further captures the similarity between different modalities through condition vectors, enhances the correlation degree of important features, and mines deeper level interactivity between modalities. Finally, using a multi-head attention mechanism, the composite cross modal enhanced features are concatenated and fused with low order signals to increase the weight of important features within the modality, preserve the unique feature information of the initial modality, and perform the final emotion classification task on the obtained multimodal fused features. The model evaluation is conducted on the CMU-MOSI and CMU-MOSEI datasets, and the results show that the model is improved in accuracy and F1 metrics compared with other existing models. It can be seen that the CCIN-SA model can effectively explore the correlation between different modalities and make more accurate emotional judgments.

cross modal interaction; attention mechanism; feature fusion; composite fusion layer; multimodal emotional analysis

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords