Using Kolmogorov&#x2013;Arnold Networks in Transformer Model: A Study on Low-Resource Neural Machine Translation

Bilge Kagan Yazar; Erdal Kilic

doi:10.1109/access.2025.3601069

IEEE Access (Jan 2025)

Using Kolmogorov–Arnold Networks in Transformer Model: A Study on Low-Resource Neural Machine Translation

Bilge Kagan Yazar,
Erdal Kilic

Affiliations

Bilge Kagan Yazar: ORCiD; Faculty of Engineering, Ondokuz Mayıs University, Samsun, Türkiye
Erdal Kilic: ORCiD; Faculty of Engineering, Ondokuz Mayıs University, Samsun, Türkiye

DOI: https://doi.org/10.1109/access.2025.3601069
Journal volume & issue: Vol. 13
pp. 147034 – 147053

Abstract

Read online

Neural machine translation is one of the most significant research area with the widespread use of deep learning. However, unlike other problems, machine translation includes at least two languages. Due to this situation, the amount of data between the languages to be translated is an important factor for translation success. On the other hand, low-resource languages have problems with the amount of data, which poses a significant challenge to the success of machine translation. Transformer models have achieved great success by modeling long-term dependencies with the self-attention mechanism. However, the feed forward layers (FFN) that follow each self-attention layer constitute almost all the non-embedding parameters of the model. On the other hand, studies in the literature have been conducted on the necessity of these FFN layers in the Transformer model and on different alternatives that can be used. Kolmogorov-Arnold networks (KAN) have recently come to the forefront as a new neural network architecture that has achieved success on many problems. The KAN structure can better learn patterns in complex data using learnable activation functions instead of fixed ones. Accordingly, this study proposes using KAN layers instead of FFN layers in the Transformer model for the low-resource translation problem. It is aimed to overcome the low-resource problem and to present a new alternative within the Transformer model employing the adaptive activation functions of KANs. In traditional Transformer models, FFN layers consist of two linear transformations and ReLU activation functions. In the proposed structure, firstly, the KAN structure is used instead of FFN layers in the Transformer model without any changes in the model dimensions. Then, experiments are conducted with lower-dimensional KAN layers and various parameter sets. The study is carried out using Turkish-English and Kazakh-English language pairs. Obtained findings reveal that using KAN layers instead of FFN layers in the Transformer model has a positive effect on the translation success and that KAN layers used in similar or lower dimensions significantly increase the success of the Transformer model.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords