Cross-Modal Dynamic Transfer Learning for Multimodal Emotion Recognition

Soyeon Hong; Hyeoungguk Kang; Hyunsouk Cho

doi:10.1109/ACCESS.2024.3356185

IEEE Access (Jan 2024)

Cross-Modal Dynamic Transfer Learning for Multimodal Emotion Recognition

Soyeon Hong,
Hyeoungguk Kang,
Hyunsouk Cho

Affiliations

Soyeon Hong: ORCiD; Department of Artificial Intelligence, Ajou University, Suwon, Republic of Korea
Hyeoungguk Kang: Department of Artificial Intelligence, Ajou University, Suwon, Republic of Korea
Hyunsouk Cho: ORCiD; Department of Artificial Intelligence, Ajou University, Suwon, Republic of Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3356185
Journal volume & issue: Vol. 12
pp. 14324 – 14333

Abstract

Read online

Multimodal Emotion Recognition is an important research area for developing human-centric applications, especially in the context of video platforms. Most existing models have attempted to develop sophisticated fusion techniques to integrate heterogeneous features from different modalities. However, these fusion methods can affect performance since not all modalities help figure out the semantic alignment for emotion prediction. We observed that the 8.0% of misclassified instances’ performance is improved for the existing fusion model when one of the input modalities is masked. Based on this observation, we propose a representation learning method called Cross-modal DynAmic Transfer learning (CDaT), which dynamically filters the low-confident modality and complements it with the high-confident modality using uni-modal masking and cross-modal representation transfer learning. We train an auxiliary network that learns model confidence scores to determine which modality is low-confident and how much the transfer should occur from other modalities. Furthermore, it can be used with any fusion model in a model-agnostic way because it leverages transfer between low-level uni-modal information via probabilistic knowledge transfer loss. Experiments have demonstrated the effect of CDaT with four different state-of-the-art fusion models on the CMU-MOSEI and IEMOCAP datasets for emotion recognition.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords