Proceedings on Engineering Sciences (Sep 2024)
SENTIMENT ANALYSIS ON SPEECH SIGNALS: LEVERAGING MFCC-LSTM TECHNIQUE FOR ENHANCED EMOTIONAL UNDERSTANDING
Abstract
The analysis of emotions expressed in spoken language holds a pivotal role in human communication, artificial intelligence, and human-computer interaction. While emotion recognition in text has seen considerable advancements, recognizing emotional states in spoken speech presents distinct challenges and opportunities. This research introduces an innovative approach that harnesses Mel Frequency Cepstral Coefficients (MFCC) and Long Short-Term Memory (LSTM) networks to facilitate deep emotion recognition in spoken speech signals. This study explores the profound potential of the MFCC-LSTM framework, a combination of established audio feature extraction and deep learning. Mel Frequency Cepstral Coefficients offer a powerful representation of spectral features over time, while LSTM networks excel at modeling temporal dependencies. This system classifies emotional states such as sadness, angry, neutral, and happiness from the speaker's utterances. Several performance assessments were carried out on the suggested MFCC-LSTM model. There is significant improvement in the recognition rates when compared with other models that are currently available. The proposed hybrid model reached 96 % recognition success.
Keywords