Dianxin kexue (May 2024)

Dual-feature speech emotion recognition fusion algorithm based on wavelet scattering transform and MFCC

  • YING Na,
  • WU Shunpeng,
  • YANG Meng,
  • ZOU Yujian

Journal volume & issue
Vol. 40
pp. 62 – 72

Abstract

Read online

A fusion algorithm named permutation entropy weighted and bias adjustment rule fusion (PEW-BAR) was proposed to enhance the accuracy of speech emotion recognition by exploiting the emotional information in the spectral characteristics of speech signals. The algorithm was based on the integration of wavelet scattering transform and Mel-frequency cepstral coefficients (MFCC). Firstly, wavelet scattering features and MFCC-related features from speech signals were extracted. Then, the wavelet scattering features were expanded in the scale dimension and applied support vector machines to obtain posterior probabilities for emotion recognition. And permutation entropy was calculated and a weighted fusion based on this entropy was subsequently applied. Finally, a bias adjustment rule was utilized to refine the integration results obtained from the MFCC-related features. Experimental results on various datasets, including EMODB, RAVDESS, and eNTERFACE05, demonstrate notable improvements. The proposed algorithm outperforms traditional wavelet scattering coefficient-based methods, achieving accuracy improvements of 2.82%, 2.85%, and 5.92%, respectively. Additionally, it shows enhancements of 3.40%, 2.87%, and 5.80% in terms of unweighted average recall (UAR), and a 6.89% improvement on the IEMOCAP dataset.

Keywords