AdaCB: An Adaptive Gradient Method with Convergence Range Bound of Learning Rate

Xuanzhi Liao; Shahnorbanun Sahran; Azizi Abdullah; Syaimak Abdul Shukor

doi:10.3390/app12189389

Applied Sciences (Sep 2022)

AdaCB: An Adaptive Gradient Method with Convergence Range Bound of Learning Rate

Xuanzhi Liao,
Shahnorbanun Sahran,
Azizi Abdullah,
Syaimak Abdul Shukor

Affiliations

Xuanzhi Liao: Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Selangor, Malaysia
Shahnorbanun Sahran: Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Selangor, Malaysia
Azizi Abdullah: Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Selangor, Malaysia
Syaimak Abdul Shukor: Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Selangor, Malaysia

DOI: https://doi.org/10.3390/app12189389
Journal volume & issue: Vol. 12, no. 18
p. 9389

Abstract

Read online

Adaptive gradient descent methods such as Adam, RMSprop, and AdaGrad achieve great success in training deep learning models. These methods adaptively change the learning rates, resulting in a faster convergence speed. Recent studies have shown their problems include extreme learning rates, non-convergence issues, as well as poor generalization. Some enhanced variants have been proposed, such as AMSGrad, and AdaBound. However, the performances of these alternatives are controversial and some drawbacks still occur. In this work, we proposed an optimizer called AdaCB, which limits the learning rates of Adam in a convergence range bound. The bound range is determined by the LR test, and then two bound functions are designed to constrain Adam, and two bound functions tend to a constant value. To evaluate our method, we carry out experiments on the image classification task, three models including Smallnet, Network IN Network, and Resnet are trained on CIFAR10 and CIFAR100 datasets. Experimental results show that our method outperforms other optimizers on CIFAR10 and CIFAR100 datasets with accuracies of (82.76%, 53.29%), (86.24%, 60.19%), and (83.24%, 55.04%) on Smallnet, Network IN Network and Resnet, respectively. The results also indicate that our method maintains a faster learning speed, like adaptive gradient methods, in the early stage and achieves considerable accuracy, like SGD (M), at the end.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords