Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki (Jun 2024)

Speech Emotion Recognition Method Based on Support Vector Machine and Suprasegmental Acoustic Features

  • D. V. Krasnoproshin,
  • M. I. Vashkevich

DOI
https://doi.org/10.35596/1729-7648-2024-22-3-93-100
Journal volume & issue
Vol. 22, no. 3
pp. 93 – 100

Abstract

Read online

The problem of recognizing emotions in a speech signal using mel-frequency cepstral coefficients using a classifier based on the support vector machine has been studied. The RAVDESS data set was used in the experiments. A model is proposed that uses a 306-component suprasegmental feature vector as input to a support vector machine classifier. Model quality was assessed using unweighted average recall (UAR). The use of linear, polynomial and radial basis functions as a kernel in a classifier based on the support vector machine is considered. The use of different signal analysis frame sizes (from 23 to 341 ms) at the stage of extracting mel-frequency cepstral coefficients was investigated. The research results revealed significant accuracy of the resulting model (UAR = 48 %). The proposed approach shows potential for applications such as voice assistants, virtual agents, and mental health diagnostics.

Keywords