Trend-Smooth: Accelerate Asynchronous SGD by Smoothing Parameters Using Parameter Trends

Guoxin Cui; Jiafeng Guo; Yixing Fan; Yanyan Lan; Xueqi Cheng

doi:10.1109/ACCESS.2019.2949611

IEEE Access (Jan 2019)

Trend-Smooth: Accelerate Asynchronous SGD by Smoothing Parameters Using Parameter Trends

Guoxin Cui,
Jiafeng Guo,
Yixing Fan,
Yanyan Lan,
Xueqi Cheng

Affiliations

Guoxin Cui: ORCiD; CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Jiafeng Guo: CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yixing Fan: CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yanyan Lan: CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xueqi Cheng: CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

DOI: https://doi.org/10.1109/ACCESS.2019.2949611
Journal volume & issue: Vol. 7
pp. 156848 – 156859

Abstract

Read online

Stochastic gradient descent(SGD) is the fundamental sequential method in training large scale machine learning models. To accelerate the training process, researchers proposed to use the asynchronous stochastic gradient descent (A-SGD) method in model learning. However, due to the stale information when updating parameters, A-SGD converges more slowly than SGD in the same iteration number. Moreover, A-SGD often converges to a high loss value and results in lower model accuracy. In this paper, we propose a novel algorithm called Trend-Smooth which can be adapted to the asynchronous parallel environment to overcome the above problems. Specifically, Trend-Smooth makes use of the parameter trend during the training process to shrink the learning rate of some dimensions where the gradients' directions are opposite to the trends of parameters. Experiments on MNIST and CIFAR-10 datasets confirm that Trend-Smooth can accelerate the convergence speed in asynchronous training process. The test accuracy that Trend-Smooth achieves is shown to be higher than other asynchronous parallel baseline methods, and is very close to the SGD method. Moreover, Trend-Smooth can also be combined with other adaptive learning rate methods(like Momentum, RMSProp and Adam) in the asynchronous parallel environment to promote their performance.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords