Jisuanji kexue yu tansuo (Mar 2024)

Low-Resource Machine Translation Based on Training Strategy with Changing Gradient Weight

  • WANG Jiaqi, ZHU Junguo, YU Zhengtao

DOI
https://doi.org/10.3778/j.issn.1673-9418.2211078
Journal volume & issue
Vol. 18, no. 3
pp. 731 – 739

Abstract

Read online

In recent years, neural network models such as Transformer have achieved significant success in machine translation. However, training these models relies on rich labeled data, posing a challenge for low-resource machine translation due to the limited scale of parallel corpora. This limitation often leads to subpar performance and a susceptibility to overfitting on high-frequency vocabulary, thereby reducing the model’s generalization ability on the test set. To alleviate these issues, this paper proposes a strategy of gradient weight modification. Specifically, it suggests multiplying the gradients generated for each new batch by a coefficient on top of the Adam algorithm. This coefficient incrementally increases, aiming to weaken the model’s dependence on high-frequency features during early training while maintaining the rapid convergence advantage of the algorithm in the later stages. This paper also outlines the modified training process, including adjustments and decay of coefficients, to emphasize different aspects at different training stages. The goal of this strategy is to enhance attention to low-frequency vocabulary and prevent the model from overfitting to high-frequency terms. Experimental translation tasks are conducted on three low-resource bilingual datasets, and the proposed method demonstrates improvements of 0.72, 1.37, and 1.04 BLEU scores relative to the baseline model on the respective test set.

Keywords