MEDT: Using Multimodal Encoding-Decoding Network as in Transformer for Multimodal Sentiment Analysis

Qingfu Qi; Liyuan Lin; Rui Zhang; Chengrong Xue

doi:10.1109/ACCESS.2022.3157712

IEEE Access (Jan 2022)

MEDT: Using Multimodal Encoding-Decoding Network as in Transformer for Multimodal Sentiment Analysis

Qingfu Qi,
Liyuan Lin,
Rui Zhang,
Chengrong Xue

Affiliations

Qingfu Qi: ORCiD; College of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin, China
Liyuan Lin: ORCiD; College of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin, China
Rui Zhang: College of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin, China
Chengrong Xue: College of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin, China

DOI: https://doi.org/10.1109/ACCESS.2022.3157712
Journal volume & issue: Vol. 10
pp. 28750 – 28759

Abstract

Read online

Multimodal sentiment analysis is a challenging task in the field of natural language processing (NLP). It uses multimodal signals (natural language, facial gestures, and acoustic behavior) in videos to generate emotional understanding. However, the importance of single modality data in the video to emotional outcomes is not static. With the extension of the time dimension, the emotional attributes of a specific natural language will be affected by non-natural language data, resulting in a vector shift in the feature space. At the same time, long-term dependencies within a specific modality and long-term dependencies between multiple modalities that are “unaligned” need to be considered. In response to the above problems, this paper proposes Multimodal Encoding-Decoding Network with Transformer. The network model encodes multimodal data through a Bidirectional Encoder Representations from Transformers (BERT) network and Transformer encoder to resolve long-term dependencies within modalities. And the network reconstructs the Transformer decoder to solve the weight problem of multimodal data in an iterative way. The network fully considers the long-term dependencies between modalities and the offset effect of non-natural language data on natural language data. Under the same experimental conditions, we validated our model on general multimodal sentiment analysis datasets. Compared with state-of-the-art models, the network achieves good progress and strong stability.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords