IEEE Access (Jan 2022)

DbAPE: Denoising-Based APE System for Improving English-Myanmar NMT

  • May Myo Zin,
  • Teeradaj Racharak,
  • Minh Le Nguyen

DOI
https://doi.org/10.1109/ACCESS.2022.3185415
Journal volume & issue
Vol. 10
pp. 67047 – 67057

Abstract

Read online

Automatic post-editing (APE) research aims to investigate methods for correcting systematic errors in machine translation (MT) results. Recent work has shown successful practices of APE for improving MT output quality; however, their effectiveness strongly relies on the availability of large-scale human-created APE triplets. The high production cost of human post-edited data has led to the absence of APE triplets for most language pairs, including English-Myanmar, which has become a limiting factor for the applicability of the APE task. This work investigates how to conduct the APE task on the English-Myanmar MT where human-edited APE triplets are unavailable. We build a denoising-based APE (DbAPE) system using only the monolingual and parallel MT corpora. The system takes the source sentence (src) and the MT output (mt) as inputs and produces the post-edited mt as output by operating the three processes together, including word alignment extraction, enriching mt using the extracted word alignment information, and denoising the enriched-version of mt. We conduct extensive experiments by applying our APE system as a post-processor to the raw output of the existing English-Myanmar MT systems. APE translations produced by DbAPE show statistically significant improvements of at least +4% BLEU and −16% TER points absolute over the original NMT. Moreover, DbAPE can improve the quality of the texts generated by state-of-the-art systems such as mT5 and Google Translate. In addition, we perform word alignment experiments with four types of alignment methods and demonstrate that the proposed multilingual word aligner can achieve robust performance over previous state-of-the-art models.

Keywords