Impact of Mixed Precision Techniques on Training and Inference Efficiency of Deep Neural Networks

Marion Dorrich; Mingcheng Fan; Andreas M. Kist

doi:10.1109/ACCESS.2023.3284388

IEEE Access (Jan 2023)

Impact of Mixed Precision Techniques on Training and Inference Efficiency of Deep Neural Networks

Marion Dorrich,
Mingcheng Fan,
Andreas M. Kist

Affiliations

Marion Dorrich: ORCiD; Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universiät Erlangen-Nürnberg, Erlangen, Germany
Mingcheng Fan: ORCiD; Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universiät Erlangen-Nürnberg, Erlangen, Germany
Andreas M. Kist: ORCiD; Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universiät Erlangen-Nürnberg, Erlangen, Germany

DOI: https://doi.org/10.1109/ACCESS.2023.3284388
Journal volume & issue: Vol. 11
pp. 57627 – 57634

Abstract

Read online

In the deep learning community, increasingly large models are being developed, leading to rapidly growing computational costs and energy costs. Recently, a new trend has been arising, advocating that researchers should also report the energy efficiency besides their model’s performance in their papers. Previous research has shown that reduced precision can be helpful to improve energy efficiency. Based on this finding, we propose a simple practice to effectively improve the energy efficiency of training and inference, i.e., training the model with mixed precision and deploying it on Edge TPUs. We evaluated its effectiveness by comparing the speed-up of a state-of-the-art semantic segmentation architecture with respect to different typical usage scenarios, including using different devices, deep learning frameworks, model sizes, and batch sizes. Our results show that enabled mixed precision can gain up to a $1.9\times $ speedup compared to the most common and default float32 data type on GPUs. Deploying the models on Edge TPU further boosted the inference by a factor of 6. Our approach allows researchers to accelerate their training and inference procedures without jeopardizing the model’s accuracy, meanwhile reducing energy consumption and electricity cost easily without changing their model architecture or retraining. Furthermore, our approach is helpful in reducing the carbon footprint used to train and deploy the neural network and thus has a positive effect on environmental resources.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords