Multilingual Meta-Transfer Learning for Low-Resource Speech Recognition

Rui Zhou; Takaki Koshikawa; Akinori Ito; Takashi Nose; Chia-Ping Chen

doi:10.1109/ACCESS.2024.3486711

IEEE Access (Jan 2024)

Multilingual Meta-Transfer Learning for Low-Resource Speech Recognition

Rui Zhou,
Takaki Koshikawa,
Akinori Ito,
Takashi Nose,
Chia-Ping Chen

Affiliations

Rui Zhou: ORCiD; Graduate School of Engineering, Tohoku University, Sendai, Miyagi, Japan
Takaki Koshikawa: ORCiD; Graduate School of Engineering, Tohoku University, Sendai, Miyagi, Japan
Akinori Ito: ORCiD; Graduate School of Engineering, Tohoku University, Sendai, Miyagi, Japan
Takashi Nose: Graduate School of Engineering, Tohoku University, Sendai, Miyagi, Japan
Chia-Ping Chen: Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan

DOI: https://doi.org/10.1109/ACCESS.2024.3486711
Journal volume & issue: Vol. 12
pp. 158493 – 158504

Abstract

Read online

This paper proposes a novel meta-transfer learning method to improve automatic speech recognition (ASR) performance in low-resource languages. Nowadays, we are witnessing high interest in low-resource ASR tasks aiming at delivering feasible and reliable systems with very limited data. The main challenge is the design and development of a methodology to address the issue of data scarcity. Our proposed meta-transfer learning approach combines two well-known machine-learning methods: transfer learning and meta-learning. We propose their integration that can ameliorate the training bottlenecks and overfitting issues with pre-training models on low-resource speech data. For evaluation, we conduct extensive multilingual ASR experiments on the Common Voice Corpus and Globalphone Corpus and compare the performance of the meta-transfer learning, meta-learning, and transfer learning methods. The proposed meta-transfer learning achieves a relative character error rate (CER) reduction of 11.62% over meta-learning and a relative CER reduction of 10.86% over transfer learning in low-resource experiments. We used less than 15 minutes of data for each target language in near-zero resource language experiments. Our meta-transfer learning approach achieved an average CER of 25.25% less than meta-learning and transfer learning. These results clearly demonstrate that the proposed integration works well in ASR tasks in languages with very limited data resources.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords