Communication Optimization Schemes for Accelerating Distributed Deep Learning Systems

Jaehwan Lee; Hyeonseong Choi; Hyeonwoo Jeong; Baekhyeon Noh; Ji Sun Shin

doi:10.3390/app10248846

Applied Sciences (Dec 2020)

Communication Optimization Schemes for Accelerating Distributed Deep Learning Systems

Jaehwan Lee,
Hyeonseong Choi,
Hyeonwoo Jeong,
Baekhyeon Noh,
Ji Sun Shin

Affiliations

Jaehwan Lee: School of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, Korea
Hyeonseong Choi: School of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, Korea
Hyeonwoo Jeong: School of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, Korea
Baekhyeon Noh: School of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, Korea
Ji Sun Shin: Department of Computer and Information Security, Sejong University, Seoul 05006, Korea

DOI: https://doi.org/10.3390/app10248846
Journal volume & issue: Vol. 10, no. 24
p. 8846

Abstract

Read online

In a distributed deep learning system, a parameter server and workers must communicate to exchange gradients and parameters, and the communication cost increases as the number of workers increases. This paper presents a communication data optimization scheme to mitigate the decrease in throughput due to communication performance bottlenecks in distributed deep learning. To optimize communication, we propose two methods. The first is a layer dropping scheme to reduce communication data. The layer dropping scheme we propose compares the representative values of each hidden layer with a threshold value. Furthermore, to guarantee the training accuracy, we store the gradients that are not transmitted to the parameter server in the worker’s local cache. When the value of gradients stored in the worker’s local cache is greater than the threshold, the gradients stored in the worker’s local cache are transmitted to the parameter server. The second is an efficient threshold selection method. Our threshold selection method computes the threshold by replacing the gradients with the L1 norm of each hidden layer. Our data optimization scheme reduces the communication time by about 81% and the total training time by about 70% in a 56 Gbit network environment.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords