Machine Learning Based Optimized Pruning Approach for Decoding in Statistical Machine Translation

Debajyoty Banik; Asif Ekbal; Pushpak Bhattacharyya

doi:10.1109/ACCESS.2018.2883738

IEEE Access (Jan 2019)

Machine Learning Based Optimized Pruning Approach for Decoding in Statistical Machine Translation

Debajyoty Banik,
Asif Ekbal,
Pushpak Bhattacharyya

Affiliations

Debajyoty Banik: ORCiD; Department of Computer Science and Engineering, Indian Institute of Technology Patna, Bihta, India
Asif Ekbal: Department of Computer Science and Engineering, Indian Institute of Technology Patna, Bihta, India
Pushpak Bhattacharyya: Department of Computer Science and Engineering, Indian Institute of Technology Patna, Bihta, India

DOI: https://doi.org/10.1109/ACCESS.2018.2883738
Journal volume & issue: Vol. 7
pp. 1736 – 1751

Abstract

Read online

A conventional decoding algorithm is critical to the success of any statistical machine translation system. Providing an enormous amount of space leads to inappropriate slow decoding. There is a trade-off between the translation accuracy and the decoding speed. Pruning algorithms (like histogram pruning, threshold pruning) are trying to optimize this. The pruning algorithm has a pre-defined limit on the supplemental parameters (i.e. stack size, beam threshold) that helps to improve the translation quality and speed up the decoder. However, the same parameter value cannot provide the qualitative translation in optimum time. These stack size and beam threshold values should be changed based on texts’ structures. In this paper, we identify the best stack size and beam threshold values runtime based on the text structure and characteristics using a machine learning-based approach. Then, the values of these parameters are applied into the beam search algorithm for decoding. Finally, our experiments on low-resourced Asian languages show significant performance improvements in terms of their translation accuracy and decoding time. The HindEnCorp and ILCI datasets are used as the benchmark datasets with English-Hindi, Hindi-Marathi, Hindi-Konkani, Bengali-Hindi language pair, for our various experiments. Moreover, we incorporate the proposed technique in cube pruning algorithm for faster decoding. We notice more improvement in this approach.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords