Effect on speech emotion classification of a feature selection approach using a convolutional neural network

Ammar Amjad; Lal Khan; Hsien-Tsung Chang

doi:10.7717/peerj-cs.766

PeerJ Computer Science (Nov 2021)

Effect on speech emotion classification of a feature selection approach using a convolutional neural network

Ammar Amjad,
Lal Khan,
Hsien-Tsung Chang

Affiliations

Ammar Amjad: Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan
Lal Khan: Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan
Hsien-Tsung Chang: Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan

DOI: https://doi.org/10.7717/peerj-cs.766
Journal volume & issue: Vol. 7
p. e766

Abstract

Read online Read online

Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from audio signals. However, these features are not sufficient to correctly identify the emotional state of the speaker. The advantages of a deep convolutional neural network (DCNN) are investigated in the proposed work. A pretrained framework is used to extract the features from speech emotion databases. In this work, we adopt the feature selection (FS) approach to find the discriminative and most important features for SER. Many algorithms are used for the emotion classification problem. We use the random forest (RF), decision tree (DT), support vector machine (SVM), multilayer perceptron classifier (MLP), and k-nearest neighbors (KNN) to classify seven emotions. All experiments are performed by utilizing four different publicly accessible databases. Our method obtains accuracies of 92.02%, 88.77%, 93.61%, and 77.23% for Emo-DB, SAVEE, RAVDESS, and IEMOCAP, respectively, for speaker-dependent (SD) recognition with the feature selection method. Furthermore, compared to current handcrafted feature-based SER methods, the proposed method shows the best results for speaker-independent SER. For EMO-DB, all classifiers attain an accuracy of more than 80% with or without the feature selection technique.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords