Applied Sciences (Jan 2023)

Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data

  • Atnafu Lambebo Tonja,
  • Olga Kolesnikova,
  • Alexander Gelbukh,
  • Grigori Sidorov

DOI
https://doi.org/10.3390/app13021201
Journal volume & issue
Vol. 13, no. 2
p. 1201

Abstract

Read online

Despite the many proposals to solve the neural machine translation (NMT) problem of low-resource languages, it continues to be difficult. The issue becomes even more complicated when few resources cover only a single domain. In this paper, we discuss the applicability of a source-side monolingual dataset of low-resource languages to improve the NMT system for such languages. In our experiments, we used Wolaytta–English translation as a low-resource language. We discuss the use of self-learning and fine-tuning approaches to improve the NMT system for Wolaytta–English translation using both authentic and synthetic datasets. The self-learning approach showed +2.7 and +2.4 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively, over the best-performing baseline model. Further fine-tuning the best-performing self-learning model showed +1.2 and +0.6 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively. We reflect on our contributions and plan for the future of this difficult field of study.

Keywords