Applied Sciences (Sep 2020)

Empirical Performance Analysis of Collective Communication for Distributed Deep Learning in a Many-Core CPU Environment

  • Junghoon Woo,
  • Hyeonseong Choi,
  • Jaehwan Lee

Journal volume & issue
Vol. 10, no. 19
p. 6717


Read online

To accommodate lots of training data and complex training models, “distributed” deep learning training has become employed more and more frequently. However, communication bottlenecks between distributed systems lead to poor performance of distributed deep learning training. In this study, we proposed a new collective communication method in a Python environment by utilizing Multi-Channel Dynamic Random Access Memory (MCDRAM) in Intel Xeon Phi Knights Landing processors. Major deep learning software platforms, such as TensorFlow and PyTorch, offer Python as a main development language, so we developed an efficient communication library by adapting Memkind library, which is a C-based library to utilize high-performance memory MCDRAM. For performance evaluation, we tested the popular collective communication methods in distributed deep learning, such as Broadcast, Gather, and AllReduce. We conducted experiments to analyze the effect of high-performance memory and processor location on communication performance. In addition, we analyze performance in a Docker environment for further relevance given the recent major trend of Cloud computing. By extensive experiments in our testbed, we confirmed that the communication in our proposed method showed performance improvement by up to 487%.