Data augmentation using a 1D-CNN model with MFCC/MFMC features for speech emotion recognition

Thomas Mary Little Flower; Thirasama Jaya; Sreedharan Christopher Ezhil Singh

doi:10.1080/00051144.2024.2371249

Automatika (Oct 2024)

Data augmentation using a 1D-CNN model with MFCC/MFMC features for speech emotion recognition

Thomas Mary Little Flower,
Thirasama Jaya,
Sreedharan Christopher Ezhil Singh

Affiliations

Thomas Mary Little Flower: Department of ECE, St.Xavier’s Catholic College of Engineering, Chunkankadai, India
Thirasama Jaya: Department of ECE, Saveetha College of Engineering, Chennai, India
Sreedharan Christopher Ezhil Singh: Department of Mechanical Engineering, Vimal Jyothi Engineering College, Kannur, India

DOI: https://doi.org/10.1080/00051144.2024.2371249
Journal volume & issue: Vol. 65, no. 4
pp. 1325 – 1338

Abstract

Read online

Speech emotion recognition (SER) is attractive in several domains, such as automated translation, call centres, intelligent healthcare, and human–computer interaction. Deep learning models for emotion identification need considerable labelled data, which is only sometimes available in the SER industry. A database needs enough speech samples, good features, and a better classifier to identify emotions efficiently. This study uses data augmentation to enhance the amount of input voice samples and address the data shortage issue. The database capacity increases by adding white noise to the speech signals by data augmentation. In this work, the Mel-frequency Cepstral Coefficient (MFCC) and Mel-frequency Magnitude Coefficient (MFMC) features, along with a one-dimensional convolutional neural network (1D-CNN), are used to classify speech emotions. The datasets utilized to estimate the model's enactment were AESDD, CAFE, EmoDB, IEMOCAP, and MESD. The data augmentation with the 1D-CNN (MFMC) model performed best, with an average accuracy of 99.2% for AESDD, 99.5% for CAFE, 97.5% for EmoDB, 92.4% for IEMOCAP and 96.9% for the MESD database. The proposed 1D-CNN (MFMC) with data augmentation outperforms the 1D-CNN (MFCC) without data augmentation in emotion recognition.

Published in Automatika

ISSN: 0005-1144 (Print); 1848-3380 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Technology: Mechanical engineering and machinery: Control engineering systems. Automatic machinery (General); Technology: Technology (General): Industrial engineering. Management engineering: Automation
Website: https://www.tandfonline.com/journals/taut

About the journal

Abstract

Keywords