2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning

Youhe Jiang; Huaxi Gu; Yunfeng Lu; Xiaoshan Yu

doi:10.1109/ACCESS.2020.3028367

IEEE Access (Jan 2020)

2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning

Youhe Jiang,
Huaxi Gu,
Yunfeng Lu,
Xiaoshan Yu

Affiliations

Youhe Jiang: ORCiD; The State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an, China
Huaxi Gu: ORCiD; The State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an, China
Yunfeng Lu: ORCiD; The State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an, China
Xiaoshan Yu: The State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an, China

DOI: https://doi.org/10.1109/ACCESS.2020.3028367
Journal volume & issue: Vol. 8
pp. 183488 – 183494

Abstract

Read online

Gradient synchronization, a process of communication among machines in large-scale distributed machine learning (DML), plays a crucial role in improving DML performance. Since the scale of distributed clusters is continuously expanding, state-of-the-art DML synchronization algorithms suffer from latency for thousands of GPUs. In this article, we propose 2D-HRA, a two-dimensional hierarchical ring-based all-reduce algorithm in large-scale DML. 2D-HRA combines the ring with more latency-optimal hierarchical methods, and synchronizes parameters on two dimensions to make full use of the bandwidth. Simulation results show that 2D-HRA can efficiently alleviate the high latency and accelerate the synchronization process in large-scale clusters. Compared with traditional algorithms (ring based), 2D-HRA achieves up to 76.9% reduction in gradient synchronization time in clusters of different scale.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords