Jisuanji kexue yu tansuo (Jan 2024)

Deep Learning Compiler Load Balancing Optimization Method for Model Training

  • WANG Li, GAO Kai, ZHAO Yaqian, LI Rengang, CAO Fang, GUO Zhenhua

DOI
https://doi.org/10.3778/j.issn.1673-9418.2209026
Journal volume & issue
Vol. 18, no. 1
pp. 111 – 126

Abstract

Read online

For computing-intensive artificial intelligence (AI) training tasks, the computational graph is more complex, and data loading, task division of the computational graph, and load balancing of task scheduling have become the key factors affecting the computing performance. This paper proposes three optimization methods to make the task scheduling of model training in deep learning compilers reach the load balance state. Firstly, the load balance between CPU and back-end computing devices is realized by automatically establishing an efficient pipeline for data loading and model training, which improves the overall energy efficiency of the system. Secondly, the layered optimization technology of computational graph is used to realize the load balance of computational graph when the back-end devices are scheduling. Finally, this paper improves the resource utilization of back-end devices by automatically establishing efficient pipeline between layers. Experimental results show that the proposed optimization method achieves the system load balancing in the process of automatically mapping the training tasks to underlying hardware devices. Compared with traditional deep learning frameworks and compilers such as TensorFlow, nGraph, etc., this paper achieves 2%~10% performance improvement in the training of different AI models, and the overall power consumption of the training system can be reduced by more than 10%.

Keywords