Jisuanji kexue yu tansuo (May 2024)
Sentiment Analysis Combining Dynamic Gradient and Multi-view Co-attention
Abstract
Aiming at the problems of unbalanced inter-modal optimization and inadequate fusion of multimodal features in multimodal sentiment analysis, a multimodal sentiment analysis model combining dynamic gradient mechanism and multi-view co-attention mechanism (DG-MCM) is proposed, which can effectively mine single-modal representation and fully integrate multimodal information. Firstly, the model uses pre-trained model BERT (bidirectional encoder representation from transformers) and stacked long short-term memory (SLSTM) to learn the features of text, audio and video, and proposes a dynamic gradient mechanism. By monitoring the contribution difference and learning speed of each mode, the feature learning of each mode is assisted. Secondly, the features of different modes obtained are fused using the multi-view co-attention mechanism. By projecting every two modes into multiple spaces for interaction, more adequate fusion features are obtained. Finally, fusion features and single-modal features are spliced together for sentiment prediction. Experimental results on CMU-MOSI and CMU-MOSEI datasets show that this model can fully learn information between single mode and different modes, and effectively improve the accuracy of multimodal sentiment analysis.
Keywords