Mathematics (Dec 2022)

Adaptive Distributed Parallel Training Method for a Deep Learning Model Based on Dynamic Critical Paths of DAG

  • Yan Zeng,
  • Wei Wang,
  • Yong Ding,
  • Jilin Zhang,
  • Yongjian Ren,
  • Guangzheng Yi

DOI
https://doi.org/10.3390/math10244788
Journal volume & issue
Vol. 10, no. 24
p. 4788

Abstract

Read online

AI provides a new method for massive simulated data calculations in molecular dynamics, materials, and other scientific computing fields. However, the complex structures and large-scale parameters of neural network models make them difficult to develop and train. The automatic parallel technology based on graph algorithms is one of the most promising methods to solve this problem, despite the low efficiency in the design, implementation, and execution of distributed parallel policies for large-scale neural network models. In this paper, we propose an adaptive distributed parallel training method based on the dynamic generation of critical DAG (directed acyclic graph) paths, called FD-DPS, to solve this efficiency problem. Firstly, the proposed model splits operators with the dimension of the tensor, which can expand the space available for model parallelism. Secondly, a dynamic critical path generation method is employed to determine node priority changes in the DAG of the neural network models. Finally, the model implements the optimal scheduling of critical paths based on the priority of the nodes, thereby improving the performance of parallel strategies. Our experiments show that FD-DPS can achieve 12.76% and 11.78% faster training on PnasNet_mobile and ResNet_200 models, respectively, compared with the MP-DPS and Fast methods.

Keywords