Effect of Word Sense Disambiguation on Neural Machine Translation: A Case Study in Korean

Quang-Phuoc Nguyen; Anh-Dung Vo; Joon-Choul Shin; Cheol-Young Ock

doi:10.1109/ACCESS.2018.2851281

IEEE Access (Jan 2018)

Effect of Word Sense Disambiguation on Neural Machine Translation: A Case Study in Korean

Quang-Phuoc Nguyen,
Anh-Dung Vo,
Joon-Choul Shin,
Cheol-Young Ock

Affiliations

Quang-Phuoc Nguyen: ORCiD; Department of IT Convergence, University of Ulsan, Ulsan, South Korea
Anh-Dung Vo: ORCiD; Department of IT Convergence, University of Ulsan, Ulsan, South Korea
Joon-Choul Shin: ORCiD; Department of IT Convergence, University of Ulsan, Ulsan, South Korea
Cheol-Young Ock: ORCiD; Department of IT Convergence, University of Ulsan, Ulsan, South Korea

DOI: https://doi.org/10.1109/ACCESS.2018.2851281
Journal volume & issue: Vol. 6
pp. 38512 – 38523

Abstract

Read online

With the advent of robust deep learning, neural machine translation (NMT) has achieved great progress and recently become the dominant paradigm in machine translation (MT). However, it is still confronted with the challenge of word ambiguities that force NMT to choose among several translation candidates that represent different senses of an input word. This research presents a case study using Korean word sense disambiguation (WSD) to improve NMT performance. First, we constructed a Korean lexical semantic network (LSN) as a large-scale lexical semantic knowledge base. Then, based on the Korean LSN, we built a Korean WSD preprocessor that can annotate the correct sense of Korean words in the training corpus. Finally, we conducted a series of translation experiments using Korean-English, Korean-French, Korean-Spanish, and Korean-Japanese language pairs. The experimental results show that our Korean WSD system can significantly improve the translation quality of NMT in terms of the BLEU, TER, and DLRATIO metrics. On average, it improved the precision by 2.94 BLEU points and improved translation error prevention by 4.04 TER points and 4.51 DLRATIO points for all the language pairs.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords