Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

S. Indrapriyadarsini; Shahrzad Mahboubi; Hiroshi Ninomiya; Takeshi Kamio; Hideki Asai

doi:10.3390/a15010006

Algorithms (Dec 2021)

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

S. Indrapriyadarsini,
Shahrzad Mahboubi,
Hiroshi Ninomiya,
Takeshi Kamio,
Hideki Asai

Affiliations

S. Indrapriyadarsini: Graduate School of Science and Technology, Shizuoka University, Hamamatsu 432-8561, Shizuoka, Japan
Shahrzad Mahboubi: Graduate School of Electrical and Information Engineering, Shonan Institute of Technology, Fujisawa 251-8511, Kanagawa, Japan
Hiroshi Ninomiya: Graduate School of Electrical and Information Engineering, Shonan Institute of Technology, Fujisawa 251-8511, Kanagawa, Japan
Takeshi Kamio: Graduate School of Information Sciences, Hiroshima City University, Hiroshima 731-3194, Shizuoka, Japan
Hideki Asai: Research Institute of Electronics, Shizuoka University, Hamamatsu 432-8561, Shizuoka, Japan

DOI: https://doi.org/10.3390/a15010006
Journal volume & issue: Vol. 15, no. 1
p. 6

Abstract

Read online

Gradient-based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have been shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method, though less commonly used in training neural networks, is known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks, and to briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords