IEEE Access (Jan 2023)

Meta-Learning for Indian Languages: Performance Analysis and Improvements With Linguistic Similarity Measures

  • C. S. Anoop,
  • A. G. Ramakrishnan

DOI
https://doi.org/10.1109/ACCESS.2023.3300790
Journal volume & issue
Vol. 11
pp. 82050 – 82064

Abstract

Read online

Indian languages share a lot of overlap in acoustic and linguistic content. Though different languages use different writing systems, the phoneme sets logically overlap. Most of these languages are low-resourced, lacking enough annotated speech data to build good automatic speech recognition (ASR) systems. Recently proposed model-agnostic meta-learning (MAML) algorithm has shown great success in the fast adaptation of multilingual models to unseen datasets. In this work, we establish the usefulness of MAML pretraining in quickly building reasonably good ASRs for low-resource Indian languages. MAML significantly outperforms joint multilingual training in its capability for few-shot learning and faster adaptation. On average, MAML yields absolute improvements of 5.4% in CER and 20.3% in WER over joint multilingual pretraining in the fast-adaptation setting with five epoch fine-tuning. Further, we exploit the similarities of the source transcriptions with target data through a loss-weighing scheme during the training to improve the performance of MAML models. Similarity-based loss-weighings yield absolute improvements of 0.2% in CER and 1% in WER on average.

Keywords