Applied Sciences (Apr 2024)

Improvement of Multimodal Emotion Recognition Based on Temporal-Aware Bi-Direction Multi-Scale Network and Multi-Head Attention Mechanisms

  • Yuezhou Wu,
  • Siling Zhang,
  • Pengfei Li

DOI
https://doi.org/10.3390/app14083276
Journal volume & issue
Vol. 14, no. 8
p. 3276

Abstract

Read online

Emotion recognition is a crucial research area in natural language processing (NLP), aiming to identify emotional states such as happiness, anger, and sadness from various sources like speech, text, and facial expressions. In this paper, we propose an improved MMER (multimodal emotion recognition) method using TIM-Net (Temporal-Aware Bi-Direction Multi-Scale Network) and attention mechanisms. Firstly, we introduce the methods for extracting and fusing the multimodal features. Then, we present the TIM-Net and attention mechanisms, which are utilized to enhance the MMER algorithm. We evaluate our approach on the IEMOCAP and MELD datasets, and compared to existing methods, our approach demonstrates superior performance. The weighted accuracy recall (WAR) on the IEMOCAP dataset is 83.9%, and the weighted accuracy recall rate on the MELD dataset is 62.7%. Finally, the impact of the TIM-Net model and the attention mechanism on the emotion recognition performance is further investigated through ablation experiments.

Keywords