IEEE Access (Jan 2019)

Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition

  • Pengxu Jiang,
  • Hongliang Fu,
  • Huawei Tao,
  • Peizhi Lei,
  • Li Zhao

DOI
https://doi.org/10.1109/ACCESS.2019.2927384
Journal volume & issue
Vol. 7
pp. 90368 – 90377

Abstract

Read online

Speech is the most effective way for people to exchange complex information. Recognition of emotional information contained in speech is one of the important challenges in the field of artificial intelligence. To better acquire emotional features in speech signals, a parallelized convolutional recurrent neural network (PCRN) with spectral features is proposed for speech emotion recognition. First, frame-level features are extracted from each utterance and, a long short-term memory is employed to learn these features frame by frame. At the same time, the deltas and delta-deltas of the log Mel-spectrogram are calculated and reconstructed into three channels (static, delta, and delta-delta); these 3-D features are learned by a convolutional neural network (CNN). Then, the two learned high-level features are fused and batch normalized. Finally, a SoftMax classifier is used to classify emotions. Our PCRN model simultaneously processes two different types of features in parallel to better learn the subtle changes in emotion. The experimental results on four public datasets show the superiority of our proposed method, which is better than the previous works.

Keywords