Multimodal Sentiment Analysis of Government Information Comments Based on Contrastive Learning and Cross-Attention Fusion Networks

Guangyu Mu; Chuanzhi Chen; Xiurong Li; Jiaxue Li; Xiaoqing Ju; Jiaxiu Dai

doi:10.1109/ACCESS.2024.3493933

IEEE Access (Jan 2024)

Multimodal Sentiment Analysis of Government Information Comments Based on Contrastive Learning and Cross-Attention Fusion Networks

Guangyu Mu,
Chuanzhi Chen,
Xiurong Li,
Jiaxue Li,
Xiaoqing Ju,
Jiaxiu Dai

Affiliations

Guangyu Mu: ORCiD; School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun, China
Chuanzhi Chen: ORCiD; School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun, China
Xiurong Li: ORCiD; Faculty of Information Technology, Beijing University of Technology, Beijing, China
Jiaxue Li: ORCiD; School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun, China
Xiaoqing Ju: ORCiD; School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun, China
Jiaxiu Dai: ORCiD; School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun, China

DOI: https://doi.org/10.1109/ACCESS.2024.3493933
Journal volume & issue: Vol. 12
pp. 165525 – 165538

Abstract

Read online

Accurate identification of sentiments in government-related comments is crucial for policymakers to deeply understand public opinion, adjust policies promptly, and enhance overall satisfaction. Thus, we create a model for emotion recognition in multimodal sentiment analysis of government information comments based on contrastive learning and cross-attention fusion networks. Firstly, we collect text-image comments from Today’s Headlines App’s Politics and Law section and extract textual and visual features. We fine-tune the model with LoRA and optimize the feature representation by making low-rank adjustments to the fused features. Secondly, we utilize contrastive learning with reverse prediction to analyze intra-class and inter-class cross-modal dynamics. Then, we propose a novel fusion network that utilizes cross-attention to learn the complementary relationship between different modalities. Finally, the features are combined using the fully connected layer. The experiment illustrates that the model achieves a 96.80% accuracy in recognizing emotion polarity. Compared with the multimodal model CLIP, the accuracy of the proposed method is improved by 10.21%. The model could assist the government in emotional evolution analysis, detection of public opinion, and online public opinion guidance.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords