Handling Vanishing Gradient Problem Using Artificial Derivative

Zheng Hu; Jiaojiao Zhang; Yun Ge

doi:10.1109/access.2021.3054915

IEEE Access (Jan 2021)

Handling Vanishing Gradient Problem Using Artificial Derivative

Zheng Hu,
Jiaojiao Zhang,
Yun Ge

Affiliations

Zheng Hu: ORCiD; School of Software, Nanchang Hangkong University, Nanchang, China
Jiaojiao Zhang: Department of Clinical Sciences, Faculty of Medicine, Polytechnic University of Marche, Ancona, Italy
Yun Ge: School of Software, Nanchang Hangkong University, Nanchang, China

DOI: https://doi.org/10.1109/access.2021.3054915
Journal volume & issue: Vol. 9
pp. 22371 – 22377

Abstract

Read online

Sigmoid function and ReLU are commonly used activation functions in neural networks (NN). However, sigmoid function is vulnerable to the vanishing gradient problem, while ReLU has a special vanishing gradient problem that is called dying ReLU problem. Though many studies provided methods to alleviate this problem, there has not been an efficient feasible solution. Hence, we proposed a method replacing the original derivative function with an artificial derivative in a pertinent way. Our method optimized gradients of activation functions without varying activation functions nor introducing extra layers. Our investigations demonstrated that the method can effectively alleviate the vanishing gradient problem for both ReLU and sigmoid function with few computational cost.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords