International Journal of Mathematical, Engineering and Management Sciences (Oct 2024)

Improved Urdu-English Neural Machine Translation with a fully Convolutional Neural Network Encoder

  • Huma Israr,
  • Muhammad Khuram Shahzad,
  • Shahid Anwar

DOI
https://doi.org/10.33889/IJMEMS.2024.9.5.056
Journal volume & issue
Vol. 9, no. 5
pp. 1067 – 1088

Abstract

Read online

Neural machine translation (NMT) approaches driven by artificial intelligence (AI) has gained more and more attention in recent years, mainly due to their simplicity yet state-of-the-art performance. Despite NMT models with attention mechanism relying heavily on the accessibility of substantial parallel corpora, they have demonstrated efficacy even for languages with limited linguistic resources. The convolutional neural network (CNN) is frequently employed in tasks involving visual and speech recognition. Implementing CNN for MT is still challenging compared to the predominant approaches. Recent research has shown that the CNN-based NMT model cannot capture long-term dependencies present in the source sentence. The CNN-based model can only capture the word dependencies within the width of its filters. This unnatural character often causes a worse performance for CNN-based NMT than the RNN-based NMT models. This study introduces a simple method to improve neural translation of a low-resource language, specifically Urdu-English (UR-EN). In this paper, we use a Fully Convolutional Neural Network (FConv-NN) based NMT architecture to create a powerful MT encoder for UR-EN translation that can capture the long dependency of words in a sentence. Although the model is quite simple, it yields strong empirical results. Experimental results show that the FConv-NN model consistently outperforms the traditional CNN-based model with filters. On the Urdu-English Dataset, the FConv-NN model produces translation with a gain of 18.42 BLEU points. Moreover, the quantitative and comparative analysis shows that in a low-resource setting, FConv-NN-based NMT outperforms conventional CNN-based NMT models.

Keywords