IEEE Access (Jan 2023)
Distributed Stochastic Gradient Descent With Compressed and Skipped Communication
Abstract
This paper introduces CompSkipDSGD, a new algorithm for distributed stochastic gradient descent that aims to improve communication efficiency by compressing and selectively skipping communication. In addition to compression, CompSkipDSGD allows both workers and the server to skip communication in any iteration of the training process and reserve it for future iterations without significantly decreasing testing accuracy. Our experimental results on the large-scale ImageNet dataset demonstrate that CompSkipDSGD can save hundreds of gigabytes of communication while maintaining similar levels of accuracy compared to state-of-the-art algorithms. The experimental results are supported by a theoretical analysis that demonstrates the convergence of CompSkipDSGD under established assumptions. Overall, CompSkipDSGD could be useful for reducing communication costs in distributed deep learning and enabling the use of large-scale datasets and models in complex environments.
Keywords