Applied Sciences (Oct 2023)

M2ER: Multimodal Emotion Recognition Based on Multi-Party Dialogue Scenarios

  • Bo Zhang,
  • Xiya Yang,
  • Ge Wang,
  • Ying Wang,
  • Rui Sun

DOI
https://doi.org/10.3390/app132011340
Journal volume & issue
Vol. 13, no. 20
p. 11340

Abstract

Read online

Researchers have recently focused on multimodal emotion recognition, but issues persist in recognizing emotions in multi-party dialogue scenarios. Most studies have only used text and audio modality, ignoring the video modality. To address this, we propose M2ER, a multimodal emotion recognition scheme based on multi-party dialogue scenarios. Addressing the issue of multiple faces appearing in the same frame of the video modality, M2ER introduces a method using multi-face localization for speaker recognition to eliminate the interference of non-speakers. The attention mechanism is used to fuse and classify different modalities. We conducted extensive experiments in unimodal and multimodal fusion using the multi-party dialogue dataset MELD. The results show that M2ER achieves superior emotion recognition in both text and audio modalities compared to the baseline model. The proposed method using speaker recognition in the video modality improves emotion recognition performance by 6.58% compared to the method without speaker recognition. In addition, the multimodal fusion based on the attention mechanism also outperforms the baseline fusion model.

Keywords