Low-Resource Machine Translation Based on Training Strategy with Changing Gradient Weight

WANG Jiaqi, ZHU Junguo, YU Zhengtao

doi:10.3778/j.issn.1673-9418.2211078

Jisuanji kexue yu tansuo (Mar 2024)

Low-Resource Machine Translation Based on Training Strategy with Changing Gradient Weight

WANG Jiaqi, ZHU Junguo, YU Zhengtao

Affiliations

WANG Jiaqi, ZHU Junguo, YU Zhengtao: 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China 2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2211078
Journal volume & issue: Vol. 18, no. 3
pp. 731 – 739

Abstract

Read online

In recent years, neural network models such as Transformer have achieved significant success in machine translation. However, training these models relies on rich labeled data, posing a challenge for low-resource machine translation due to the limited scale of parallel corpora. This limitation often leads to subpar performance and a susceptibility to overfitting on high-frequency vocabulary, thereby reducing the model’s generalization ability on the test set. To alleviate these issues, this paper proposes a strategy of gradient weight modification. Specifically, it suggests multiplying the gradients generated for each new batch by a coefficient on top of the Adam algorithm. This coefficient incrementally increases, aiming to weaken the model’s dependence on high-frequency features during early training while maintaining the rapid convergence advantage of the algorithm in the later stages. This paper also outlines the modified training process, including adjustments and decay of coefficients, to emphasize different aspects at different training stages. The goal of this strategy is to enhance attention to low-frequency vocabulary and prevent the model from overfitting to high-frequency terms. Experimental translation tasks are conducted on three low-resource bilingual datasets, and the proposed method demonstrates improvements of 0.72, 1.37, and 1.04 BLEU scores relative to the baseline model on the respective test set.

neural machine translation; overfitting; dynamic gradient weight

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords