IEEE Access (Jan 2025)
Using Kolmogorov–Arnold Networks in Transformer Model: A Study on Low-Resource Neural Machine Translation
Abstract
Neural machine translation is one of the most significant research area with the widespread use of deep learning. However, unlike other problems, machine translation includes at least two languages. Due to this situation, the amount of data between the languages to be translated is an important factor for translation success. On the other hand, low-resource languages have problems with the amount of data, which poses a significant challenge to the success of machine translation. Transformer models have achieved great success by modeling long-term dependencies with the self-attention mechanism. However, the feed forward layers (FFN) that follow each self-attention layer constitute almost all the non-embedding parameters of the model. On the other hand, studies in the literature have been conducted on the necessity of these FFN layers in the Transformer model and on different alternatives that can be used. Kolmogorov-Arnold networks (KAN) have recently come to the forefront as a new neural network architecture that has achieved success on many problems. The KAN structure can better learn patterns in complex data using learnable activation functions instead of fixed ones. Accordingly, this study proposes using KAN layers instead of FFN layers in the Transformer model for the low-resource translation problem. It is aimed to overcome the low-resource problem and to present a new alternative within the Transformer model employing the adaptive activation functions of KANs. In traditional Transformer models, FFN layers consist of two linear transformations and ReLU activation functions. In the proposed structure, firstly, the KAN structure is used instead of FFN layers in the Transformer model without any changes in the model dimensions. Then, experiments are conducted with lower-dimensional KAN layers and various parameter sets. The study is carried out using Turkish-English and Kazakh-English language pairs. Obtained findings reveal that using KAN layers instead of FFN layers in the Transformer model has a positive effect on the translation success and that KAN layers used in similar or lower dimensions significantly increase the success of the Transformer model.
Keywords