Low-Rank Gradient Descent

Romain Cosson; Ali Jadbabaie; Anuran Makur; Amirhossein Reisizadeh; Devavrat Shah

doi:10.1109/OJCSYS.2023.3315088

IEEE Open Journal of Control Systems (Jan 2023)

Low-Rank Gradient Descent

Romain Cosson,
Ali Jadbabaie,
Anuran Makur,
Amirhossein Reisizadeh,
Devavrat Shah

Affiliations

Romain Cosson: ORCiD; National Institute for Research in Digital Science and Technology, Paris, France
Ali Jadbabaie: ORCiD; Laboratory for Information and Decision Systems, Massachusetts Institute Technology, Cambridge, MA, USA
Anuran Makur: ORCiD; Department of Computer Science and School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA
Amirhossein Reisizadeh: ORCiD; Laboratory for Information and Decision Systems, Massachusetts Institute Technology, Cambridge, MA, USA
Devavrat Shah: ORCiD; Laboratory for Information and Decision Systems, Massachusetts Institute Technology, Cambridge, MA, USA

DOI: https://doi.org/10.1109/OJCSYS.2023.3315088
Journal volume & issue: Vol. 2
pp. 380 – 395

Abstract

Read online

Several recent empirical studies demonstrate that important machine learning tasks such as training deep neural networks, exhibit a low-rank structure, where most of the variation in the loss function occurs only in a few directions of the input space. In this article, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent (GD). Our proposed Low-Rank Gradient Descent (LRGD) algorithm finds an $\epsilon$-approximate stationary point of a $p$-dimensional function by first identifying $r \leq p$ significant directions, and then estimating the true $p$-dimensional gradient at every iteration by computing directional derivatives only along those $r$ directions. We establish that the “directional oracle complexities” of LRGD for strongly convex and non-convex objective functions are ${\mathcal {O}}(r \log (1/\epsilon) + rp)$ and ${\mathcal {O}}(r/\epsilon ^{2} + rp)$, respectively. Therefore, when $r \ll p$, LRGD provides significant improvement over the known complexities of ${\mathcal {O}}(p \log (1/\epsilon))$ and ${\mathcal {O}}(p/\epsilon ^{2})$ of GD in the strongly convex and non-convex settings, respectively. Furthermore, we formally characterize the classes of exactly and approximately low-rank functions. Empirically, using real and synthetic data, LRGD provides significant gains over GD when the data has low-rank structure, and in the absence of such structure, LRGD does not degrade performance compared to GD. This suggests that LRGD could be used in practice in any setting in place of GD.

Published in IEEE Open Journal of Control Systems

ISSN: 2694-085X (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Mechanical engineering and machinery: Control engineering systems. Automatic machinery (General)
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=9552933

About the journal

Abstract

Keywords