Research on Chinese Speech Emotion Recognition Based on Deep Neural Network and Acoustic Features

Ming-Che Lee; Sheng-Cheng Yeh; Jia-Wei Chang; Zhen-Yi Chen

doi:10.3390/s22134744

Sensors (Jun 2022)

Research on Chinese Speech Emotion Recognition Based on Deep Neural Network and Acoustic Features

Ming-Che Lee,
Sheng-Cheng Yeh,
Jia-Wei Chang,
Zhen-Yi Chen

Affiliations

Ming-Che Lee: Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan 333, Taiwan
Sheng-Cheng Yeh: Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan 333, Taiwan
Jia-Wei Chang: Department of Computer Science and Information Engineering, National Taichung University of Science and Technology, Taichung City 404, Taiwan
Zhen-Yi Chen: Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan 333, Taiwan

DOI: https://doi.org/10.3390/s22134744
Journal volume & issue: Vol. 22, no. 13
p. 4744

Abstract

Read online

In recent years, the use of Artificial Intelligence for emotion recognition has attracted much attention. The industrial applicability of emotion recognition is quite comprehensive and has good development potential. This research uses voice emotion recognition technology to apply it to Chinese speech emotion recognition. The main purpose of this research is to transform gradually popularized smart home voice assistants or AI system service robots from a touch-sensitive interface to a voice operation. This research proposed a specifically designed Deep Neural Network (DNN) model to develop a Chinese speech emotion recognition system. In this research, 29 acoustic characteristics in acoustic theory are used as the training attributes of the proposed model. This research also proposes a variety of audio adjustment methods to amplify datasets and enhance training accuracy, including waveform adjustment, pitch adjustment, and pre-emphasize. This study achieved an average emotion recognition accuracy of 88.9% in the CASIA Chinese sentiment corpus. The results show that the deep learning model and audio adjustment method proposed in this study can effectively identify the emotions of Chinese short sentences and can be applied to Chinese voice assistants or integrated with other dialogue applications.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords