Mathematics Interdisciplinary Research (Sep 2024)
Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments
Abstract
This paper introduces a novel approach to enhance the performance of the stochastic gradient descent (SGD) algorithm by incorporating a modified decay step size based on $\frac{1}{\sqrt{t}}$. The proposed step size integrates a logarithmic term, leading to the selection of smaller values in the final iterations. Our analysis establishes a convergence rate of $O(\frac{\ln T}{\sqrt{T}})$ for smooth non-convex functions without the Polyak-Łojasiewicz condition. To evaluate the effectiveness of our approach, we conducted numerical experiments on image classification tasks using the Fashion-MNIST and CIFAR10 datasets, and the results demonstrate significant improvements in accuracy, with enhancements of $0.5\%$ and $1.4\%$ observed, respectively, compared to the traditional $\frac{1}{\sqrt{t}}$ step size. The source code can be found at https://github.com/Shamaeem/LNSQRTStepSize.
Keywords