A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

Huiyun Zhang; Heming Huang; Henry Han

doi:10.3390/app11219897

Applied Sciences (Oct 2021)

A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

Huiyun Zhang,
Heming Huang,
Henry Han

Affiliations

Huiyun Zhang: School of Computer Science, Qinghai Normal University, Xining 810008, China
Heming Huang: School of Computer Science, Qinghai Normal University, Xining 810008, China
Henry Han: Department of Computer Science, School of Engineering and Computer Science, Baylor University, One Bear Place #97141, Waco, TX 76798, USA

DOI: https://doi.org/10.3390/app11219897
Journal volume & issue: Vol. 11, no. 21
p. 9897

Abstract

Read online

Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recalls on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords