IEEE Access (Jan 2019)
Machine Learning Based Optimized Pruning Approach for Decoding in Statistical Machine Translation
Abstract
A conventional decoding algorithm is critical to the success of any statistical machine translation system. Providing an enormous amount of space leads to inappropriate slow decoding. There is a trade-off between the translation accuracy and the decoding speed. Pruning algorithms (like histogram pruning, threshold pruning) are trying to optimize this. The pruning algorithm has a pre-defined limit on the supplemental parameters (i.e. stack size, beam threshold) that helps to improve the translation quality and speed up the decoder. However, the same parameter value cannot provide the qualitative translation in optimum time. These stack size and beam threshold values should be changed based on texts’ structures. In this paper, we identify the best stack size and beam threshold values runtime based on the text structure and characteristics using a machine learning-based approach. Then, the values of these parameters are applied into the beam search algorithm for decoding. Finally, our experiments on low-resourced Asian languages show significant performance improvements in terms of their translation accuracy and decoding time. The HindEnCorp and ILCI datasets are used as the benchmark datasets with English-Hindi, Hindi-Marathi, Hindi-Konkani, Bengali-Hindi language pair, for our various experiments. Moreover, we incorporate the proposed technique in cube pruning algorithm for faster decoding. We notice more improvement in this approach.
Keywords