Vietnam Journal of Computer Science (Feb 2024)

Exploring Composite Indexes for Domain Adaptation in Neural Machine Translation

  • Nhan Vo Minh,
  • Khue Nguyen Tran Minh,
  • Long H. B. Nguyen,
  • Dien Dinh

DOI
https://doi.org/10.1142/S2196888823500148
Journal volume & issue
Vol. 11, no. 01
pp. 75 – 94

Abstract

Read online

Domain adaptation in neural machine translation (NMT) tasks often involves working with datasets that have a different distribution from the training data. In such scenarios, k-nearest-neighbor machine translation (kNN-MT) has been shown to be effective in retrieving relevant information from large datastores. However, the high-dimensional context vectors of large neural machine translation model result in high computational costs for distance computation and storage. To address this issue, index optimization techniques have been proposed, including the use of inverted file index (IVF) and product vector quantization (PQ), called IVFPQ. In this paper, we explore the recent index techniques for efficient machine translation domain adaptation and combine multiple index structures to improve the efficiency of nearest-neighbor search in domain adaptation datasets for machine translation task. Specifically, we evaluate the effectiveness when combining optimized product quantization (OPQ) and hierarchical navigable small-world (HNSW) indexing with IVFPQ. Our study aims to provide insights into the most suitable composite index methods for efficient nearest-neighbor search in domain adaptation datasets, with a focus on improving both accuracy and speed.

Keywords