Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Shulun Wang; Feng Liu; Bin Liu

doi:10.1109/ACCESS.2021.3138201

IEEE Access (Jan 2021)

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Shulun Wang,
Feng Liu,
Bin Liu

Affiliations

Shulun Wang: ORCiD; Department of Computer Science, Beijing Jiaotong University, Beijing, China
Feng Liu: ORCiD; Department of Computer Science, Beijing Jiaotong University, Beijing, China
Bin Liu: ORCiD; Key Laboratory of Deep Oil and Gas, China University of Petroleum (East China), Qingdao, China

DOI: https://doi.org/10.1109/ACCESS.2021.3138201
Journal volume & issue: Vol. 9
pp. 168749 – 168759

Abstract

Read online

Softmax is widely used in neural networks for multiclass classification, gate structure, and attention mechanisms. The statistical assumption that the input is normally distributed supports the gradient stability of softmax. However, when used in attention mechanisms such as transformers, because the correlation scores between embeddings are often not normally distributed, the gradient vanishing problem appears, and we prove this point through experimental confirmation. In this work, we suggest replacing the exponential function with periodic functions, and delve into some potential periodic alternatives of Softmax from the viewpoint of value and gradient. Through experiments on a simply designed demo referenced to LeViT, our method was proven to be able to alleviate the gradient problem and yield substantial improvements compared to Softmax and its variants. Further, we analyze the impact of pre-normalization for Softmax and our methods through mathematics and experiments. Finally, we increase the depth of the demo and prove the applicability of our method to deep structures. The code are available at https://github.com/slwang9353/Period-alternatives-of-Softmax.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords