Applied Artificial Intelligence (Dec 2024)

An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning

  • Lianting Gong,
  • Xingzhou He,
  • Jianzhong Yang

DOI
https://doi.org/10.1080/08839514.2024.2371712
Journal volume & issue
Vol. 38, no. 1

Abstract

Read online

Multimodal sentiment analysis is a technical approach that integrates various modalities to analyze sentiment tendencies or emotional states. Existing challenges encountered by this approach include redundancy in independent modal features and a lack of correlation analysis between different modalities, causing insufficient fusion and degradation of result accuracy. To address these issues, this study proposes an innovative multi-channel multimodal joint learning method for image-text sentiment analysis. First, a multi-channel feature extraction module is introduced to comprehensively capture image or text features. Second, effective interaction of multimodal features is achieved by designing modality-wise interaction modules that eliminate redundant features through cross-modal cross-attention. Last, to consider the complementary role of contextual information in sentiment analysis, an adaptive multi-task fusion method is used to merge single-modal context features with multimodal features for enhancing the reliability of sentiment predictions. Experimental results demonstrate that the proposed method achieves an accuracy of 76.98% and 75.32% on the MVSA-Single and MVSA-Multiple datasets, with F1 scores of 76.23% and 75.29%, respectively, outperforming other state-of-the-art methods. This research provides new insights and methods for advancing multimodal feature fusion, enhancing the accuracy and practicality of sentiment analysis.