Sensors (Feb 2024)

A New Network Structure for Speech Emotion Recognition Research

  • Chunsheng Xu,
  • Yunqing Liu,
  • Wenjun Song,
  • Zonglin Liang,
  • Xing Chen

DOI
https://doi.org/10.3390/s24051429
Journal volume & issue
Vol. 24, no. 5
p. 1429

Abstract

Read online

Deep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the problem that the emotional information contained in the current speech signals is distributed dispersedly and cannot comprehensively integrate local and global information, this paper presents a network model based on a gated recurrent unit (GRU) and multi-head attention. We evaluate our proposed emotion model on the IEMOCAP and Emo-DB corpora. The experimental results show that the network model based on Bi-GRU and multi-head attention is significantly better than the traditional network model at detecting multiple evaluation indicators. At the same time, we also apply the model to a speech sentiment analysis task. On the CH-SIMS and MOSI datasets, the model shows excellent generalization performance.

Keywords