Journal of King Saud University: Computer and Information Sciences (Sep 2022)

Detailed analysis of implementation of aviation NMT system and the effects of aviation post-processing tools on TDIL tourism corpus

  • Saptarshi Paul,
  • Bipul Shyam Purkhyastha

Journal volume & issue
Vol. 34, no. 8
pp. 5030 – 5044

Abstract

Read online

Capable MT systems implemented using SMT and NMT for languages such as Bengali and other Indian languages are used regularly. The performance of MT systems is regulated by the domain knowledge which is directly derived from the parallel corpus that provides the guidelines used to train the model. In the last few years, spectacular results have been achieved by systems using various NMT models. Organizations like Google and Microsoft have shifted from SMT to NMT models. In this paper, we compare the implementation of the unexplored aviation domain with standard domains whose corpuses are downloaded from TDIL (https://tdil.meity.gov.in/) and also have a look at the impact of the post-processing tools on the Tourism corpus of TDIL. The implementation was accomplished using OpenNMT. English to Bengali Aviation parallel corpus has been developed and implemented along with multiple post-processing and pre-processing tools to get the desired results. The developed aviation post-processing tools have been later implemented upon TDIL Tourism corpus to test the effectiveness of the tools on non-aviation but similar corpus. The result analysis involve comparing the BLEU scores of the aviation domain and the BLEU scores of the Tourism Domain before and after the applications of the Pre and Post-processing tools.

Keywords