Meta-Learning for Indian Languages: Performance Analysis and Improvements With Linguistic Similarity Measures

C. S. Anoop; A. G. Ramakrishnan

doi:10.1109/ACCESS.2023.3300790

IEEE Access (Jan 2023)

Meta-Learning for Indian Languages: Performance Analysis and Improvements With Linguistic Similarity Measures

C. S. Anoop,
A. G. Ramakrishnan

Affiliations

C. S. Anoop: ORCiD; Indian Institute of Science, Bengaluru, India
A. G. Ramakrishnan: ORCiD; Indian Institute of Science, Bengaluru, India

DOI: https://doi.org/10.1109/ACCESS.2023.3300790
Journal volume & issue: Vol. 11
pp. 82050 – 82064

Abstract

Read online

Indian languages share a lot of overlap in acoustic and linguistic content. Though different languages use different writing systems, the phoneme sets logically overlap. Most of these languages are low-resourced, lacking enough annotated speech data to build good automatic speech recognition (ASR) systems. Recently proposed model-agnostic meta-learning (MAML) algorithm has shown great success in the fast adaptation of multilingual models to unseen datasets. In this work, we establish the usefulness of MAML pretraining in quickly building reasonably good ASRs for low-resource Indian languages. MAML significantly outperforms joint multilingual training in its capability for few-shot learning and faster adaptation. On average, MAML yields absolute improvements of 5.4% in CER and 20.3% in WER over joint multilingual pretraining in the fast-adaptation setting with five epoch fine-tuning. Further, we exploit the similarities of the source transcriptions with target data through a loss-weighing scheme during the training to improve the performance of MAML models. Similarity-based loss-weighings yield absolute improvements of 0.2% in CER and 1% in WER on average.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords