Progressive distribution adapted neural networks for cross-corpus speech emotion recognition

Yuan Zong; Yuan Zong; Hailun Lian; Jiacheng Zhang; Jiacheng Zhang; Ercui Feng; Cheng Lu; Hongli Chang; Chuangao Tang; Chuangao Tang

doi:10.3389/fnbot.2022.987146

Frontiers in Neurorobotics (Sep 2022)

Progressive distribution adapted neural networks for cross-corpus speech emotion recognition

Yuan Zong,
Yuan Zong,
Hailun Lian,
Jiacheng Zhang,
Jiacheng Zhang,
Ercui Feng,
Cheng Lu,
Hongli Chang,
Chuangao Tang,
Chuangao Tang

Affiliations

Yuan Zong: Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing, China
Yuan Zong: School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
Hailun Lian: Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing, China
Jiacheng Zhang: Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing, China
Jiacheng Zhang: School of Cyber Science and Engineering, Southeast University, Nanjing, China
Ercui Feng: Affiliated Jiangning Hospital, Nanjing Medical University, Nanjing, China
Cheng Lu: Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing, China
Hongli Chang: Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing, China
Chuangao Tang: Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing, China
Chuangao Tang: School of Biological Science and Medical Engineering, Southeast University, Nanjing, China

DOI: https://doi.org/10.3389/fnbot.2022.987146
Journal volume & issue: Vol. 16

Abstract

Read online

In this paper, we investigate a challenging but interesting task in the research of speech emotion recognition (SER), i.e., cross-corpus SER. Unlike the conventional SER, the training (source) and testing (target) samples in cross-corpus SER come from different speech corpora, which results in a feature distribution mismatch between them. Hence, the performance of most existing SER methods may sharply decrease. To cope with this problem, we propose a simple yet effective deep transfer learning method called progressive distribution adapted neural networks (PDAN). PDAN employs convolutional neural networks (CNN) as the backbone and the speech spectrum as the inputs to achieve an end-to-end learning framework. More importantly, its basic idea for solving cross-corpus SER is very straightforward, i.e., enhancing the backbone's corpus invariant feature learning ability by incorporating a progressive distribution adapted regularization term into the original loss function to guide the network training. To evaluate the proposed PDAN, extensive cross-corpus SER experiments on speech emotion corpora including EmoDB, eNTERFACE, and CASIA are conducted. Experimental results showed that the proposed PDAN outperforms most well-performing deep and subspace transfer learning methods in dealing with the cross-corpus SER tasks.

Published in Frontiers in Neurorobotics

ISSN: 1662-5218 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: https://www.frontiersin.org/journals/neurorobotics/

About the journal

Abstract

Keywords