Hybrid LSTM-Transformer Model for Emotion Recognition From Speech Audio Files

Felicia Andayani; Lau Bee Theng; Mark Teekit Tsun; Caslon Chua

doi:10.1109/ACCESS.2022.3163856

IEEE Access (Jan 2022)

Hybrid LSTM-Transformer Model for Emotion Recognition From Speech Audio Files

Felicia Andayani,
Lau Bee Theng,
Mark Teekit Tsun,
Caslon Chua

Affiliations

Felicia Andayani: ORCiD; Faculty of Engineering, Computing, and Science, Swinburne University of Technology Sarawak Campus, Kuching, Sarawak, Malaysia
Lau Bee Theng: Faculty of Engineering, Computing, and Science, Swinburne University of Technology Sarawak Campus, Kuching, Sarawak, Malaysia
Mark Teekit Tsun: ORCiD; Faculty of Engineering, Computing, and Science, Swinburne University of Technology Sarawak Campus, Kuching, Sarawak, Malaysia
Caslon Chua: Faculty of Science, Engineering and Technology, Swinburne University of Technology, Melbourne, VIC, Australia

DOI: https://doi.org/10.1109/ACCESS.2022.3163856
Journal volume & issue: Vol. 10
pp. 36018 – 36027

Abstract

Read online

Emotion is a vital component in daily human communication and it helps people understand each other. Emotion recognition plays a crucial role in developing human-computer interaction and computer-based speech emotion recognition. In a nutshell, Speech Emotion Recognition (SER) recognizes emotion signals transmitted through human speech or daily conversation where the emotions in a speech strongly depend on temporal information. Despite the fact that much existing research showed that a hybrid system performs better than traditional single classifiers used in SER, there are some limitations in each of them. As a result, this paper discussed a proposed hybrid Long Short-Term Memory (LSTM) Network and Transformer Encoder to learn the long-term dependencies in speech signals and classify emotions. Speech features are extracted with Mel Frequency Cepstral Coefficient (MFCC) and fed into the proposed hybrid LSTM-Transformer classifier. A range of performance evaluations was conducted on the proposed LSTM-Transformer model. The results indicate that it achieves a significant recognition improvement compared with existing models offered by other published works. The proposed hybrid model reached 75.62%, 85.55%, and 72.49% recognition success with the RAVDESS, Emo-DB, and language-independent datasets.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords