IEEE Access (Jan 2025)

Transformer-Based Amharic-to-English Machine Translation With Character Embedding and Combined Regularization Techniques

  • Surafiel Habib Asefa,
  • Yaregal Assabie

DOI
https://doi.org/10.1109/ACCESS.2024.3521985
Journal volume & issue
Vol. 13
pp. 1090 – 1105

Abstract

Read online

Amharic is the working language of Ethiopia and, owing to its Semitic characteristics, the language is known for its complex morphology. It is also an under-resourced language, presenting significant challenges for natural language processing tasks like machine translation. The primary challenges include the scarcity of parallel data, which increases the risk of overfitting and limits the model’s ability to generalize effectively, and the complex morphology of Amharic, which further complicates learning patterns in translation tasks. This study proposes a Transformer-based Amharic-to-English neural machine translation model that leverages character-level embeddings and integrates advanced regularization techniques, including dropout, L1, L2, and Elastic Net. By focusing on character-level embeddings, the model captures the intricate morphological patterns of Amharic and effectively handles out-of-vocabulary words. Our model significantly improves upon the previous state-of-the-art results on the Amharic-to-English neural machine translation benchmark, achieving a BLEU score of 40.59, which is 7% higher than the previous state-of-the-art result. Among the regularization techniques tested, the integration of L2 regularization with dropout applied to the pointwise feed-forward network yielded the best translation performance. Additionally, the proposed model significantly reduces the parameter count from 75 million to just 5.4 million, demonstrating substantial computational efficiency while maintaining high accuracy. Extensive experiments demonstrated improvements in test accuracy, loss reduction, and translation fidelity compared to word-level embedding models. This research provides valuable insights into addressing the challenges of low-resource and morphologically complex languages, while also offering promising directions for future work, including the exploration of multilingual models, attention mechanism optimization, and the broader application of hybrid regularization techniques in the Transformer model architecture.

Keywords