Exploring task-diverse meta-learning on Tibetan multi-dialect speech recognition

Yigang Liu; Yue Zhao; Xiaona Xu; Liang Xu; Xubei Zhang; Qiang Ji

doi:10.1186/s13636-024-00361-7

EURASIP Journal on Audio, Speech, and Music Processing (Jul 2024)

Exploring task-diverse meta-learning on Tibetan multi-dialect speech recognition

Yigang Liu,
Yue Zhao,
Xiaona Xu,
Liang Xu,
Xubei Zhang,
Qiang Ji

Affiliations

Yigang Liu: Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China
Yue Zhao: Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China
Xiaona Xu: Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China
Liang Xu: Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China
Xubei Zhang: Linguistics & Computer Science, Boston University
Qiang Ji: Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute

DOI: https://doi.org/10.1186/s13636-024-00361-7
Journal volume & issue: Vol. 2024, no. 1
pp. 1 – 8

Abstract

Read online

Abstract The disparities in phonetics and corpuses across the three major dialects of Tibetan exacerbate the difficulty of a single task model for one dialect to accommodate other different dialects. To address this issue, this paper proposes task-diverse meta-learning. Our model can acquire more comprehensive and robust features, facilitating its adaptation to the variations among different dialects. This study uses Tibetan dialect ID recognition and Tibetan speaker recognition as the source tasks for meta-learning, which aims to augment the ability of the model to discriminate variations and differences among different dialects. Consequently, the model’s performance in Tibetan multi-dialect speech recognition tasks is enhanced. The experimental results show that task-diverse meta-learning leads to improved performance in Tibetan multi-dialect speech recognition. This demonstrates the effectiveness and applicability of task-diverse meta-learning, thereby contributing to the advancement of speech recognition techniques in multi-dialect environments.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords