Multiplexed gradient descent: Fast online training of modern datasets on hardware neural networks without backpropagation

Adam N. McCaughan; Bakhrom G. Oripov; Natesh Ganesh; Sae Woo Nam; Andrew Dienstfrey; Sonia M. Buckley

doi:10.1063/5.0157645

APL Machine Learning (Jun 2023)

Multiplexed gradient descent: Fast online training of modern datasets on hardware neural networks without backpropagation

Adam N. McCaughan,
Bakhrom G. Oripov,
Natesh Ganesh,
Sae Woo Nam,
Andrew Dienstfrey,
Sonia M. Buckley

Affiliations

Adam N. McCaughan: National Institute of Standards and Technology, Boulder, Colorado 80305, USA
Bakhrom G. Oripov: National Institute of Standards and Technology, Boulder, Colorado 80305, USA
Natesh Ganesh: National Institute of Standards and Technology, Boulder, Colorado 80305, USA
Sae Woo Nam: National Institute of Standards and Technology, Boulder, Colorado 80305, USA
Andrew Dienstfrey: National Institute of Standards and Technology, Boulder, Colorado 80305, USA
Sonia M. Buckley: National Institute of Standards and Technology, Boulder, Colorado 80305, USA

DOI: https://doi.org/10.1063/5.0157645
Journal volume & issue: Vol. 1, no. 2
pp. 026118 – 026118-14

Abstract

Read online

We present multiplexed gradient descent (MGD), a gradient descent framework designed to easily train analog or digital neural networks in hardware. MGD utilizes zero-order optimization techniques for online training of hardware neural networks. We demonstrate its ability to train neural networks on modern machine learning datasets, including CIFAR-10 and Fashion-MNIST, and compare its performance to backpropagation. Assuming realistic timescales and hardware parameters, our results indicate that these optimization techniques can train a network on emerging hardware platforms orders of magnitude faster than the wall-clock time of training via backpropagation on a standard GPU, even in the presence of imperfect weight updates or device-to-device variations in the hardware. We additionally describe how it can be applied to existing hardware as part of chip-in-the-loop training or integrated directly at the hardware level. Crucially, because the MGD framework is model-free it can be applied to nearly any hardware platform with tunable parameters, and its gradient descent process can be optimized to compensate for specific hardware limitations, such as slow parameter-update speeds or limited input bandwidth.

Published in APL Machine Learning

ISSN: 2770-9019 (Online)
Publisher: AIP Publishing LLC
Country of publisher: United States
LCC subjects: Science: Physics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://pubs.aip.org/aip/aml

About the journal