MixTrain: accelerating DNN training via input mixing

Sarada Krithivasan; Sanchari Sen; Swagath Venkataramani; Anand Raghunathan

doi:10.3389/frai.2024.1387936

Frontiers in Artificial Intelligence (Sep 2024)

MixTrain: accelerating DNN training via input mixing

Sarada Krithivasan,
Sanchari Sen,
Swagath Venkataramani,
Anand Raghunathan

Affiliations

Sarada Krithivasan: Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, United States
Sanchari Sen: IBM Research, Yorktown Heights, NY, United States
Swagath Venkataramani: IBM Research, Yorktown Heights, NY, United States
Anand Raghunathan: Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, United States

DOI: https://doi.org/10.3389/frai.2024.1387936
Journal volume & issue: Vol. 7

Abstract

Read online

Training Deep Neural Networks (DNNs) places immense compute requirements on the underlying hardware platforms, expending large amounts of time and energy. An important factor contributing to the long training times is the increasing dataset complexity required to reach state-of-the-art performance in real-world applications. To address this challenge, we explore the use of input mixing, where multiple inputs are combined into a single composite input with an associated composite label for training. The goal is for training on the mixed input to achieve a similar effect as training separately on each the constituent inputs that it represents. This results in a lower number of inputs (or mini-batches) to be processed in each epoch, proportionally reducing training time. We find that naive input mixing leads to a considerable drop in learning performance and model accuracy due to interference between the forward/backward propagation of the mixed inputs. We propose two strategies to address this challenge and realize training speedups from input mixing with minimal impact on accuracy. First, we reduce the impact of inter-input interference by exploiting the spatial separation between the features of the constituent inputs in the network's intermediate representations. We also adaptively vary the mixing ratio of constituent inputs based on their loss in previous epochs. Second, we propose heuristics to automatically identify the subset of the training dataset that is subject to mixing in each epoch. Across ResNets of varying depth, MobileNetV2 and two Vision Transformer networks, we obtain upto 1.6 × and 1.8 × speedups in training for the ImageNet and Cifar10 datasets, respectively, on an Nvidia RTX 2080Ti GPU, with negligible loss in classification accuracy.

Published in Frontiers in Artificial Intelligence

ISSN: 2624-8212 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.frontiersin.org/journals/artificial-intelligence#

About the journal

Abstract

Keywords