Applied Sciences (Jul 2021)

Voice Activation for Low-Resource Languages

  • Aliaksei Kolesau,
  • Dmitrij Šešok

DOI
https://doi.org/10.3390/app11146298
Journal volume & issue
Vol. 11, no. 14
p. 6298

Abstract

Read online

Voice activation systems are used to find a pre-defined word or phrase in the audio stream. Industry solutions, such as “OK, Google” for Android devices, are trained with millions of samples. In this work, we propose and investigate several ways to train a voice activation system when the in-domain data set is small. We compare self-training exemplar pre-training, fine-tuning a model pre-trained on another domain, joint training on both an out-of-domain high-resource and a target low-resource data set, and unsupervised pre-training. In our experiments, the unsupervised pre-training and the joint-training with a high-resource data set from another domain significantly outperform a strong baseline of fine-tuning a model trained on another data set. We obtain 7–25% relative improvement depending on the model architecture. Additionally, we improve the best test accuracy on the Lithuanian data set from 90.77% to 93.85%.

Keywords