Improving Systematic Generalization of Linear Transformer Using Normalization Layers and Orthogonality Loss Function

Taewon Park; Hyun-Chul Kim

doi:10.3390/math12213390

Mathematics (Oct 2024)

Improving Systematic Generalization of Linear Transformer Using Normalization Layers and Orthogonality Loss Function

Taewon Park,
Hyun-Chul Kim

Affiliations

Taewon Park: Department of Artificial Intelligence, Kyungpook National University, Daegu 41566, Republic of Korea
Hyun-Chul Kim: Department of Artificial Intelligence, Kyungpook National University, Daegu 41566, Republic of Korea

DOI: https://doi.org/10.3390/math12213390
Journal volume & issue: Vol. 12, no. 21
p. 3390

Abstract

Read online

A Linear Transformer linearizes the attention mechanism of the vanilla Transformer architecture, significantly improving efficiency and achieving linear theoretical complexity with respect to sequence length. However, few studies have explored the capabilities of the Linear Transformer beyond its efficiency. In this work, we investigate the systematic generalization capability of the Linear Transformer, a crucial property for strong generalization to unseen data. Through preliminary experiments, we identify two major issues contributing to its unstable systematic generalization performance: (i) unconstrained norms of Queries and Keys, and (ii) high correlation among Values across the sequence. To address these issues, we propose two simple yet effective methods: normalization layers for Queries and Keys, and an orthogonality loss function applied to Values during training. In experiments, we demonstrate that applying these methods to the Linear Transformer significantly improves its stability and systematic generalization performance across several well-known tasks. Furthermore, our proposed methods outperform the vanilla Transformer on specific systematic generalization tasks, such as the sort-of-CLEVR and SCAN tasks.

Published in Mathematics

ISSN: 2227-7390 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/mathematics

About the journal

Abstract

Keywords