مجله مدل سازی در مهندسی (Mar 2023)

Data Augmentation and Effective Feature Selection in Generative Adversarial Networks for Speech Emotion Recognition

  • Arash Shilandari,
  • Hossein Marvi,
  • Hossein Khosravi

DOI
https://doi.org/10.22075/jme.2022.24865.2159
Journal volume & issue
Vol. 21, no. 72
pp. 1 – 17

Abstract

Read online

Until now, there has been no certainty based on the success or failure of using feature selection methods to increase the efficiency of SER systems. This article discusses feature selection for data augmentation in a speech emotion recognition system. The experiments were performed on four databases: EMO-DB, eNTERFACE05, SAVEE, and IEMOCAP. Simulations are performed in Python software and in addition, data analysis was performed on all four databases for four emotions of sadness, anger, happiness, and neutral. This paper discusses feature selection intending to create a GAN to augment data in a speech emotion recognition system. It will demonstrate that artificial data generated by GANs can not only augment data but also can be used to feature selection to improve classification performance. We used a GAN to augment data and selected two feature-selective networks including Fisher and LDA algorithm in two steps. SVM was also used to classify emotions. With the feedback taken from the classification network, we could bring the SER system to the optimal point of sample number and feature vector dimensions. The PCA is more effective on correlated data. The LDA algorithm works better on low-dimensional data. Fisher's method is better at reducing size than PCA. The results showed that the use of both LDA and Fisher methods in the GANs can filter the features in smaller dimensions while preserving the emotional information for classification. The results were compared with recent research and the proposed method was able to achieve 86.32% accuracy in the EMO-DB database.

Keywords