MATEC Web of Conferences (Jan 2018)

An Optimization Strategy Based on Hybrid Algorithm of Adam and SGD

  • Wang Yijun,
  • Zhou Pengyu,
  • Zhong Wenya

DOI
https://doi.org/10.1051/matecconf/201823203007
Journal volume & issue
Vol. 232
p. 03007

Abstract

Read online

Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to stochastic gradient descent (SGD). So scholars (Nitish Shirish Keskar et al., 2017) proposed a hybrid strategy to start training with Adam and switch to SGD at the right time. In the learning task with a large output space, it was observed that Adam could not converge to an optimal solution (or could not converge to an extreme point in a non-convex scene) [1]. Therefore, this paper proposes a new variant of the ADAM algorithm (AMSGRAD), which not only solves the convergence problem, but also improves the empirical performance.