IEEE Access (Jan 2019)

A Hybrid Latent Space Data Fusion Method for Multimodal Emotion Recognition

  • Shahla Nemati,
  • Reza Rohani,
  • Mohammad Ehsan Basiri,
  • Moloud Abdar,
  • Neil Y. Yen,
  • Vladimir Makarenkov

DOI
https://doi.org/10.1109/ACCESS.2019.2955637
Journal volume & issue
Vol. 7
pp. 172948 – 172964

Abstract

Read online

Multimodal emotion recognition is an emerging interdisciplinary field of research in the area of affective computing and sentiment analysis. It aims at exploiting the information carried by signals of different nature to make emotion recognition systems more accurate. This is achieved by employing a powerful multimodal fusion method. In this study, a hybrid multimodal data fusion method is proposed in which the audio and visual modalities are fused using a latent space linear map and then, their projected features into the cross-modal space are fused with the textual modality using a Dempster-Shafer (DS) theory-based evidential fusion method. The evaluation of the proposed method on the videos of the DEAP dataset shows its superiority over both decision-level and non-latent space fusion methods. Furthermore, the results reveal that employing Marginal Fisher Analysis (MFA) for feature-level audio-visual fusion results in higher improvement in comparison to cross-modal factor analysis (CFA) and canonical correlation analysis (CCA). Also, the implementation results show that exploiting textual users' comments with the audiovisual content of movies improves the performance of the system.

Keywords