Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki (Jun 2024)
Speech Emotion Recognition Method Based on Support Vector Machine and Suprasegmental Acoustic Features
Abstract
The problem of recognizing emotions in a speech signal using mel-frequency cepstral coefficients using a classifier based on the support vector machine has been studied. The RAVDESS data set was used in the experiments. A model is proposed that uses a 306-component suprasegmental feature vector as input to a support vector machine classifier. Model quality was assessed using unweighted average recall (UAR). The use of linear, polynomial and radial basis functions as a kernel in a classifier based on the support vector machine is considered. The use of different signal analysis frame sizes (from 23 to 341 ms) at the stage of extracting mel-frequency cepstral coefficients was investigated. The research results revealed significant accuracy of the resulting model (UAR = 48 %). The proposed approach shows potential for applications such as voice assistants, virtual agents, and mental health diagnostics.
Keywords