Real block-circulant matrices and DCT-DST algorithm for transformer neural network

Euis Asriani; Intan Muchtadi-Alamsyah; Intan Muchtadi-Alamsyah; Ayu Purwarianti; Ayu Purwarianti

doi:10.3389/fams.2023.1260187

Frontiers in Applied Mathematics and Statistics (Dec 2023)

Real block-circulant matrices and DCT-DST algorithm for transformer neural network

Euis Asriani,
Intan Muchtadi-Alamsyah,
Intan Muchtadi-Alamsyah,
Ayu Purwarianti,
Ayu Purwarianti

Affiliations

Euis Asriani: Doctoral Program Mathematics, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung, Bandung, Indonesia
Intan Muchtadi-Alamsyah: Algebra Research Group, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung, Bandung, Indonesia
Intan Muchtadi-Alamsyah: University Center of Excellence on Artificial Intelligence for Vision, Natural Language Processing and Big Data Analytics, Institut Teknologi Bandung, Bandung, Indonesia
Ayu Purwarianti: University Center of Excellence on Artificial Intelligence for Vision, Natural Language Processing and Big Data Analytics, Institut Teknologi Bandung, Bandung, Indonesia
Ayu Purwarianti: Informatics Research Group, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia

DOI: https://doi.org/10.3389/fams.2023.1260187
Journal volume & issue: Vol. 9

Abstract

Read online

In the encoding and decoding process of transformer neural networks, a weight matrix-vector multiplication occurs in each multihead attention and feed forward sublayer. Assigning the appropriate weight matrix and algorithm can improve transformer performance, especially for machine translation tasks. In this study, we investigate the use of the real block-circulant matrices and an alternative to the commonly used fast Fourier transform (FFT) algorithm, namely, the discrete cosine transform–discrete sine transform (DCT-DST) algorithm, to be implemented in a transformer. We explore three transformer models that combine the use of real block-circulant matrices with different algorithms. We start from generating two orthogonal matrices, U and Q. The matrix U is spanned by the combination of the reals and imaginary parts of eigenvectors of the real block-circulant matrix, whereas Q is defined such that the matrix multiplication QU can be represented in the shape of a DCT-DST matrix. The final step is defining the Schur form of the real block-circulant matrix. We find that the matrix-vector multiplication using the DCT-DST algorithm can be defined by assigning the Kronecker product between the DCT-DST matrix and an orthogonal matrix in the same order as the dimension of the circulant matrix that spanned the real block circulant. According to the experiment's findings, the dense-real block circulant DCT-DST model with largest matrix dimension was able to reduce the number of model parameters up to 41%. The same model of 128 matrix dimension gained 26.47 of BLEU score, higher compared to the other two models on the same matrix dimensions.

Published in Frontiers in Applied Mathematics and Statistics

ISSN: 2297-4687 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Applied mathematics. Quantitative methods; Science: Mathematics: Probabilities. Mathematical statistics
Website: http://journal.frontiersin.org/journal/applied-mathematics-and-statistics#

About the journal

Abstract

Keywords