A variable metric proximal stochastic gradient method: An application to classification problems

Pasquale Cascarano; Giorgia Franchini; Erich Kobler; Federica Porta; Andrea Sebastiani

EURO Journal on Computational Optimization (Jan 2024)

A variable metric proximal stochastic gradient method: An application to classification problems

Pasquale Cascarano,
Giorgia Franchini,
Erich Kobler,
Federica Porta,
Andrea Sebastiani

Affiliations

Pasquale Cascarano: Department of the Arts, University of Bologna, Bologna, Italy
Giorgia Franchini: Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Modena, Italy; Corresponding author.
Erich Kobler: Department of Neuroradiology, University Hospital Bonn, Bonn, Germany
Federica Porta: Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Modena, Italy
Andrea Sebastiani: Department of Mathematics, University of Bologna, Bologna, Italy; Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Modena, Italy

Journal volume & issue: Vol. 12
p. 100088

Abstract

Read online

Due to the continued success of machine learning and deep learning in particular, supervised classification problems are ubiquitous in numerous scientific fields. Training these models typically involves the minimization of the empirical risk over large data sets along with a possibly non-differentiable regularization. In this paper, we introduce a stochastic gradient method for the considered classification problem. To control the variance of the objective's gradients, we use an automatic sample size selection along with a variable metric to precondition the stochastic gradient directions. Further, we utilize a non-monotone line search to automatize step size selection. Convergence results are provided for both convex and non-convex objective functions. Extensive numerical experiments verify that the suggested approach performs on par with state-of-the-art methods for training both statistical models for binary classification and artificial neural networks for multi-class image classification. The code is publicly available at https://github.com/koblererich/lisavm.

Published in EURO Journal on Computational Optimization

ISSN: 2192-4406 (Print); 2192-4414 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Applied mathematics. Quantitative methods; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/euro-journal-on-computational-optimization

About the journal

Abstract

Keywords