IEEE Access (Jan 2023)

Deep Learning Approaches for Bimodal Speech Emotion Recognition: Advancements, Challenges, and a Multi-Learning Model

  • Samuel Kakuba,
  • Alwin Poulose,
  • Dong Seog Han

DOI
https://doi.org/10.1109/ACCESS.2023.3325037
Journal volume & issue
Vol. 11
pp. 113769 – 113789

Abstract

Read online

Though acoustic speech emotion recognition has been studied for a while, bimodal speech emotion recognition using both acoustic and text has gained momentum since speech emotion recognition doesn’t only involve the acoustic modality. However, there is less review work on the available bimodal speech emotion recognition (SER) research. The review works available mostly concentrate on the use of convolution neural networks (CNNs) and recurrent neural networks (RNNs). However, recent deep learning techniques like attention mechanisms and fusion strategies have shaped the bimodal SER research without explicit analysis of their significance when used singly or in combination with the traditional deep learning techniques. We therefore, review the recently published literature that involves these deep learning techniques in this paper to ascertain the current trends and challenges of bimodal SER research that have hampered it to be fully deployed in the natural environment for off-the-shelf SER applications. In addition, we carried out experiments to ascertain the optimal combination of acoustic features and the significance of the attention mechanisms and their combination with the traditional deep learning techniques. We propose a multi-technique model called the deep learning-based multi-learning model for emotion recognition (DBMER) that operates with multi-learning capabilities of CNNs, RNNs, and multi-head attention mechanisms. We noted that attention mechanisms play a pivotal role in the performance of bimodal dyadic SER systems. However, few publicly available datasets, the difficulty in acquisition of bimodal SER data, cross-corpus and multilingual studies remain open problems in bimodal SER research. Our experiments on the proposed DBMER model showed that though each of the deep learning techniques benefits the task, the results are more accurate and robust when they are used in careful combination with multi-level fusion approaches.

Keywords