IEEE Access (Jan 2021)

Communication Scheduling for Gossip SGD in a Wide Area Network

  • Hideaki Oguni,
  • Kazuyuki Shudo

DOI
https://doi.org/10.1109/ACCESS.2021.3083639
Journal volume & issue
Vol. 9
pp. 77873 – 77881

Abstract

Read online

Deep neural networks (DNNs) achieve higher accuracy as the amount of training data increases. However, training data such as personal medical data are often privacy sensitive and cannot be collected. Methods have been proposed for training with distributed data that remain in a wide area network. Due to heterogeneity in a wide area network, methods based on synchronous communication, such as all-reduce stochastic gradient descent (SGD), are not suitable, and gossip SGD is promising because it is based on asynchronous communication. Communication time is a problem in a wide area network. Gossip SGD cannot use double buffering that is a technique for hiding the communication time, since gossip SGD uses an asynchronous communication method. In this paper, we propose a type of gossip SGD in which computation and communication overlap to accelerate learning. The proposed method shares newer models by scheduling communication. To schedule the communication, the nodes share the information of the estimated communication time and communication-enabled nodes. This method is effective in both homogeneous and heterogeneous networks. The experimental results using the CIFAR-100 and Fashion-MNIST datasets demonstrate the faster convergence of the proposed method.

Keywords