On the scalability of data augmentation techniques for low-resource machine translation between Chinese and Vietnamese

Huan Vu; Ngoc Dung Bui

doi:10.1080/24751839.2023.2186625

Journal of Information and Telecommunication (Apr 2023)

On the scalability of data augmentation techniques for low-resource machine translation between Chinese and Vietnamese

Huan Vu,
Ngoc Dung Bui

Affiliations

Huan Vu: Faculty of Information Technology, University of Transport and Communications, Dong Da, Ha Noi, Viet Nam
Ngoc Dung Bui: Faculty of Information Technology, University of Transport and Communications, Dong Da, Ha Noi, Viet Nam

DOI: https://doi.org/10.1080/24751839.2023.2186625
Journal volume & issue: Vol. 7, no. 2
pp. 241 – 253

Abstract

Read online

ABSTRACTNeural Machine Translation (NMT) has constantly been shown to be a standard choice to build a translation system, in both academia and industry. For low-resource language pairs, data augmentation techniques have been widely used to tackle the data shortage problem in NMT. In this paper, we investigate the scaling behaviour of transformer-based NMT model to the increasing amount of synthetic data. Through the experiments, conducted in the Chinese-to-Vietnamese translation task, we aim to provide a guideline to the application of several methods such as back-translation, tagged back-translation, self-training and sentence concatenation in a low-resource, less-related language pair. Our results suggest that choosing the appropriate amount of synthetic data is a crucial task when building NMT systems. In addition, when combining methods, it is recommended to tag the data sources before training.

Published in Journal of Information and Telecommunication

ISSN: 2475-1839 (Print); 2475-1847 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.tandfonline.com/journals/tjit

About the journal

Abstract

Keywords