Speech Emotion and Naturalness Recognitions With Multitask and Single-Task Learnings

Bagus Tris Atmaja; Akira Sasou; Masato Akagi

doi:10.1109/ACCESS.2022.3189481

IEEE Access (Jan 2022)

Speech Emotion and Naturalness Recognitions With Multitask and Single-Task Learnings

Bagus Tris Atmaja,
Akira Sasou,
Masato Akagi

Affiliations

Bagus Tris Atmaja: ORCiD; National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan
Akira Sasou: ORCiD; National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan
Masato Akagi: ORCiD; Japan Advanced Institute of Science and Technology, Nomi, Japan

DOI: https://doi.org/10.1109/ACCESS.2022.3189481
Journal volume & issue: Vol. 10
pp. 72381 – 72387

Abstract

Read online

This paper evaluates speech emotion and naturalness recognitions by utilizing deep learning models with multitask learning and single-task learning approaches. The emotion model accommodates valence, arousal, and dominance attributes known as dimensional emotion. The naturalness ratings are labeled on a five-point scale as dimensional emotion. Multitask learning predicts both dimensional emotion (as the main task) and naturalness scores (as an auxiliary task) simultaneously. The single-task learning predicts either dimensional emotion (valence, arousal, and dominance) or naturalness score independently. The results with multitask learning show improvement from previous studies on single-task learning for both dimensional emotion recognition and naturalness predictions. Within this study, single-task learning still shows superiority over multitask learning for naturalness recognition. The scatter plots of emotion and naturalness prediction scores against the true labels in multitask learning exhibit the lack of the model; it fails to predict the low and extremely high scores. The low score of naturalness prediction in this study is possibly due to a low number of samples of unnatural speech samples since the MSP-IMPROV dataset promotes the naturalness of speech. The finding that jointly predicting naturalness with emotion helps improve the performance of emotion recognition may be embodied in the emotion recognition model in future work.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords