Feature-Based Decipherment for Machine Translation

Iftekhar Naim; Parker Riley; Daniel Gildea

doi:10.1162/coli_a_00326

Computational Linguistics (Sep 2018)

Feature-Based Decipherment for Machine Translation

Iftekhar Naim,
Parker Riley,
Daniel Gildea

Affiliations

Iftekhar Naim: Google. [email protected]
Parker Riley: University of Rochester, Computer Science Department. [email protected]
Daniel Gildea: University of Rochester, Computer Science Department. [email protected]

DOI: https://doi.org/10.1162/coli_a_00326
Journal volume & issue: Vol. 44, no. 3
pp. 525 – 546

Abstract

Read online

Orthographic similarities across languages provide a strong signal for unsupervised probabilistic transduction (decipherment) for closely related language pairs. The existing decipherment models, however, are not well suited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is computationally expensive for the proposed log-linear model. To address this challenge, we perform approximate inference via Markov chain Monte Carlo sampling and contrastive divergence. Our results show that the proposed log-linear model with contrastive divergence outperforms the existing generative decipherment models by exploiting the orthographic features. The model both scales to large vocabularies and preserves accuracy in low- and no-resource contexts.

Published in Computational Linguistics

ISSN: 0891-2017 (Print); 1530-9312 (Online)
Publisher: The MIT Press
Country of publisher: United States
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing
Website: https://direct.mit.edu/coli

About the journal