Using Transfer Learning to Realize Low Resource Dungan Language Speech Synthesis

Mengrui Liu; Rui Jiang; Hongwu Yang

doi:10.3390/app14146336

Applied Sciences (Jul 2024)

Using Transfer Learning to Realize Low Resource Dungan Language Speech Synthesis

Mengrui Liu,
Rui Jiang,
Hongwu Yang

Affiliations

Mengrui Liu: College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China
Rui Jiang: School of Educational Technology, Northwest Normal University, Lanzhou 730070, China
Hongwu Yang: School of Educational Technology, Northwest Normal University, Lanzhou 730070, China

DOI: https://doi.org/10.3390/app14146336
Journal volume & issue: Vol. 14, no. 14
p. 6336

Abstract

Read online

This article presents a transfer-learning-based method to improve the synthesized speech quality of the low-resource Dungan language. This improvement is accomplished by fine-tuning a pre-trained Mandarin acoustic model to a Dungan language acoustic model using a limited Dungan corpus within the Tacotron2+WaveRNN framework. Our method begins with developing a transformer-based Dungan text analyzer capable of generating unit sequences with embedded prosodic information from Dungan sentences. These unit sequences, along with the speech features, provide pairs as the input of Tacotron2 to train the acoustic model. Concurrently, we pre-trained a Tacotron2-based Mandarin acoustic model using a large-scale Mandarin corpus. The model is then fine-tuned with a small-scale Dungan speech corpus to derive a Dungan acoustic model that autonomously learns the alignment and mapping of the units to the spectrograms. The resulting spectrograms are converted into waveforms via the WaveRNN vocoder, facilitating the synthesis of high-quality Mandarin or Dungan speech. Both subjective and objective experiments suggest that the proposed transfer learning-based Dungan speech synthesis achieves superior scores compared to models trained only with the Dungan corpus and other methods. Consequently, our method offers a strategy to achieve speech synthesis for low-resource languages by adding prosodic information and leveraging a similar, high-resource language corpus through transfer learning.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords