Journal of Data Mining and Digital Humanities (Apr 2024)

Ainu–Japanese Bi-directional Neural Machine Translation: A Step Towards Linguistic Preservation of Ainu, An Under-Resourced Indigenous Language in Japan

  • So Miyagawa

DOI
https://doi.org/10.46298/jdmdh.13151
Journal volume & issue
Vol. NLP4DH, no. Digital humanities in...

Abstract

Read online

This study presents a groundbreaking approach to preserving the Ainu language, recognized as critically endangered by UNESCO, by developing a bi-directional neural machine translation (MT) system between Ainu and Japanese. Utilizing the Marian MT framework, known for its effectiveness with resource-scarce languages, the research aims to overcome the linguistic complexities inherent in Ainu's polysynthetic structure. The paper delineates a comprehensive methodology encompassing data collection from diverse Ainu text sources, meticulous preprocessing, and the deployment of neural MT models, culminating in the achievement of significant SacreBLEU scores that underscore the models' translation accuracy. The findings illustrate the potential of advanced MT technology to facilitate linguistic preservation and educational endeavors, advocating for integrating such technologies in safeguarding endangered languages. This research not only underscores the critical role of MT in bridging language divides but also sets a precedent for employing computational linguistics to preserve cultural and linguistic heritage.

Keywords