A Low-Complexity and Adaptive Distributed Source Coding Design for Model Aggregation in Distributed Learning

Naifu Zhang; Meixia Tao

doi:10.1109/OJCOMS.2022.3228813

IEEE Open Journal of the Communications Society (Jan 2022)

A Low-Complexity and Adaptive Distributed Source Coding Design for Model Aggregation in Distributed Learning

Naifu Zhang,
Meixia Tao

Affiliations

Naifu Zhang: ORCiD; Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China
Meixia Tao: ORCiD; Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China

DOI: https://doi.org/10.1109/OJCOMS.2022.3228813
Journal volume & issue: Vol. 3
pp. 2444 – 2460

Abstract

Read online

A major bottleneck in distributed learning is the communication overhead of exchanging intermediate model update parameters between the worker nodes and the parameter server. Recently, it is found that local gradients among different worker nodes are correlated. Therefore, distributed source coding (DSC) can be applied to increase communication efficiency by exploiting such correlation. However, it is highly non-trivial to exploite the gradient correlations in distributed learning due to the unknown and time-varying gradient correlation. In this paper, we first propose a DSC framework, named successive Wyner-Ziv coding, for distributed learning based on quantization and Slepian-Wolf (SW) coding. We prove that the proposed framework can achieve the theoretically minimum communication cost from an information theory perspective. We also propose a low-complexity and adaptive DSC for distributed learning, including a gradient statistics estimator, rate controller, and a log-likelihood ratio (LLR) computer. The gradient statistics estimator estimates the gradient statistics online based only on the quantized gradients at previous iterations, hence it does not introduce extra communication cost. The computation complexity of the rate controller and the LLR computer is reduced to a linear growth in the number of worker nodes by introducing a semi-analytical Monte Carlo simulation. Finally, we design a DSC-based distributed learning process and find that the extra delay introduced by DSC does not scale with the number of worker nodes.

Published in IEEE Open Journal of the Communications Society

ISSN: 2644-125X (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication; Social Sciences: Transportation and communications
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8782661

About the journal

Abstract

Keywords