Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora

Yuan Zong; Hailun Lian; Hongli Chang; Cheng Lu; Chuangao Tang

doi:10.3390/e24091250

Entropy (Sep 2022)

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora

Yuan Zong,
Hailun Lian,
Hongli Chang,
Cheng Lu,
Chuangao Tang

Affiliations

Yuan Zong: Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing 210096, China
Hailun Lian: Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing 210096, China
Hongli Chang: Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing 210096, China
Cheng Lu: Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing 210096, China
Chuangao Tang: Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing 210096, China

DOI: https://doi.org/10.3390/e24091250
Journal volume & issue: Vol. 24, no. 9
p. 1250

Abstract

Read online

In this paper, we focus on a challenging, but interesting, task in speech emotion recognition (SER), i.e., cross-corpus SER. Unlike conventional SER, a feature distribution mismatch may exist between the labeled source (training) and target (testing) speech samples in cross-corpus SER because they come from different speech emotion corpora, which degrades the performance of most well-performing SER methods. To address this issue, we propose a novel transfer subspace learning method called multiple distribution-adapted regression (MDAR) to bridge the gap between speech samples from different corpora. Specifically, MDAR aims to learn a projection matrix to build the relationship between the source speech features and emotion labels. A novel regularization term called multiple distribution adaption (MDA), consisting of a marginal and two conditional distribution-adapted operations, is designed to collaboratively enable such a discriminative projection matrix to be applicable to the target speech samples, regardless of speech corpus variance. Consequently, by resorting to the learned projection matrix, we are able to predict the emotion labels of target speech samples when only the source label information is given. To evaluate the proposed MDAR method, extensive cross-corpus SER tasks based on three different speech emotion corpora, i.e., EmoDB, eNTERFACE, and CASIA, were designed. Experimental results showed that the proposed MDAR outperformed most recent state-of-the-art transfer subspace learning methods and even performed better than several well-performing deep transfer learning methods in dealing with cross-corpus SER tasks.

Published in Entropy

ISSN: 1099-4300 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Astronomy: Astrophysics; Science: Physics
Website: http://www.mdpi.com/journal/entropy

About the journal

Abstract

Keywords