Voice Activation for Low-Resource Languages

Aliaksei Kolesau; Dmitrij Šešok

doi:10.3390/app11146298

Applied Sciences (Jul 2021)

Voice Activation for Low-Resource Languages

Aliaksei Kolesau,
Dmitrij Šešok

Affiliations

Aliaksei Kolesau: Department of Information Technologies, Vilnius Gediminas Technical University, Saulėtekio al. 11, LT-10223 Vilnius, Lithuania
Dmitrij Šešok: Department of Information Technologies, Vilnius Gediminas Technical University, Saulėtekio al. 11, LT-10223 Vilnius, Lithuania

DOI: https://doi.org/10.3390/app11146298
Journal volume & issue: Vol. 11, no. 14
p. 6298

Abstract

Read online

Voice activation systems are used to find a pre-defined word or phrase in the audio stream. Industry solutions, such as “OK, Google” for Android devices, are trained with millions of samples. In this work, we propose and investigate several ways to train a voice activation system when the in-domain data set is small. We compare self-training exemplar pre-training, fine-tuning a model pre-trained on another domain, joint training on both an out-of-domain high-resource and a target low-resource data set, and unsupervised pre-training. In our experiments, the unsupervised pre-training and the joint-training with a high-resource data set from another domain significantly outperform a strong baseline of fine-tuning a model trained on another data set. We obtain 7–25% relative improvement depending on the model architecture. Additionally, we improve the best test accuracy on the Lithuanian data set from 90.77% to 93.85%.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords