Deep Learning Compiler Load Balancing Optimization Method for Model Training

WANG Li, GAO Kai, ZHAO Yaqian, LI Rengang, CAO Fang, GUO Zhenhua

doi:10.3778/j.issn.1673-9418.2209026

Jisuanji kexue yu tansuo (Jan 2024)

Deep Learning Compiler Load Balancing Optimization Method for Model Training

WANG Li, GAO Kai, ZHAO Yaqian, LI Rengang, CAO Fang, GUO Zhenhua

Affiliations

WANG Li, GAO Kai, ZHAO Yaqian, LI Rengang, CAO Fang, GUO Zhenhua: State Key Laboratory of High-End & Storage Technology, Inspur Electronic Information Industry Co., Ltd., Jinan 250000, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2209026
Journal volume & issue: Vol. 18, no. 1
pp. 111 – 126

Abstract

Read online

For computing-intensive artificial intelligence (AI) training tasks, the computational graph is more complex, and data loading, task division of the computational graph, and load balancing of task scheduling have become the key factors affecting the computing performance. This paper proposes three optimization methods to make the task scheduling of model training in deep learning compilers reach the load balance state. Firstly, the load balance between CPU and back-end computing devices is realized by automatically establishing an efficient pipeline for data loading and model training, which improves the overall energy efficiency of the system. Secondly, the layered optimization technology of computational graph is used to realize the load balance of computational graph when the back-end devices are scheduling. Finally, this paper improves the resource utilization of back-end devices by automatically establishing efficient pipeline between layers. Experimental results show that the proposed optimization method achieves the system load balancing in the process of automatically mapping the training tasks to underlying hardware devices. Compared with traditional deep learning frameworks and compilers such as TensorFlow, nGraph, etc., this paper achieves 2%~10% performance improvement in the training of different AI models, and the overall power consumption of the training system can be reduced by more than 10%.

model training; compiler optimization; load balancing; hierarchical scheduling; automatic pipelining

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords