Discover Artificial Intelligence (Jan 2025)
Interval evaluation of temporal (in)stability for neural machine translation
Abstract
Abstract Though neural machine translation (NMT) has become the leading machine translation (MT) paradigm, its output may still contain errors. To improve NMT quality, it is important to investigate these errors and to see how NMT quality changes with time. The primary focus of the paper is on what is referred to here as “temporal (in)stability of NMT”, the phenomenon that was uncovered in a year-long experiment and may be researched applying interval evaluation methods. The paper presents data collected while observing how far, if at all, the Google’s Neural Machine Translation (GNMT) system progressed during a year. The data were qualitatively evaluated based on a set of indicators. To that end, 250 Russian text sentences were chosen. In the course of a year, each sentence was repeatedly translated into French using the GNMT engine (with a time step of 1 month). The produced translations were recorded and annotated in an especially designed supracorpora database, allowing to register a series of 12 translations for each of the 250 Russian sentences. To annotate the translations, there was a need to elaborate an error typology that would help reveal if the NMT system improved its output quality or not. One year-long experiment shows that not only does NMT quality improve, but it also may decrease with time.
Keywords