Speech Databases, Speech Features, and Classifiers in Speech Emotion Recognition: A Review

G. H. Mohmad Dar; Radhakrishnan Delhibabu

doi:10.1109/ACCESS.2024.3476960

IEEE Access (Jan 2024)

Speech Databases, Speech Features, and Classifiers in Speech Emotion Recognition: A Review

G. H. Mohmad Dar,
Radhakrishnan Delhibabu

Affiliations

G. H. Mohmad Dar: School of Advanced Sciences, Vellore Institute of Technology, Vellore, Tamil Nadu, India
Radhakrishnan Delhibabu: ORCiD; School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India

DOI: https://doi.org/10.1109/ACCESS.2024.3476960
Journal volume & issue: Vol. 12
pp. 151122 – 151152

Abstract

Read online

Emotion recognition from speech signals plays a crucial role in Human-Machine Interaction (HMI), particularly in the development of applications such as affective computing and interactive systems. This review seeks to provide an in-depth examination of current methodologies in speech emotion recognition (SER), with a focus on databases, feature extraction techniques, and classification models. It has been done in the past using low-level descriptors (LLDs) like Mel-Frequency Cepstral Coefficients (MFCCs), linear predictive coding (LPC), and pitch-based features in methods like Support Vector Machines (SVM), Random Forests (RF), and Gaussian Mixture Models (GMM). But the development of deep learning techniques has completely changed the field. Models like convolutional neural networks (CNNs) and long short-term memory (LSTM) networks have shown that they are better at capturing the complex temporal and spectral features of speech. This paper reviews prominent speech emotion datasets, exploring their linguistic diversity, annotation processes, and emotional labels. It also analyzes the efficacy of different speech features and classifiers in handling challenges such as data imbalance, limited data availability, and cross-lingual variations. The review highlights the need for future work to address real-time processing, context-sensitive emotion detection, and the integration of multi-modal data to enhance the performance of SER systems. By consolidating recent advancements and identifying areas for further research, this paper aims to provide a clearer path for optimizing feature extraction and classification techniques in the field of emotion recognition.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords