Adaptive and Resilient Model-Distributed Inference in Edge Computing Systems

Pengzhen Li; Erdem Koyuncu; Hulya Seferoglu

doi:10.1109/OJCOMS.2023.3280174

IEEE Open Journal of the Communications Society (Jan 2023)

Adaptive and Resilient Model-Distributed Inference in Edge Computing Systems

Pengzhen Li,
Erdem Koyuncu,
Hulya Seferoglu

Affiliations

Pengzhen Li: Electrical and Computer Engineering Department, University of Illinois Chicago, Chicago, IL, USA
Erdem Koyuncu: ORCiD; Electrical and Computer Engineering Department, University of Illinois Chicago, Chicago, IL, USA
Hulya Seferoglu: ORCiD; Electrical and Computer Engineering Department, University of Illinois Chicago, Chicago, IL, USA

DOI: https://doi.org/10.1109/OJCOMS.2023.3280174
Journal volume & issue: Vol. 4
pp. 1263 – 1273

Abstract

Read online

The traditional approach to distributed deep neural network (DNN) inference in edge computing systems is data-distributed inference. In this paradigm, each worker has a pre-trained DNN model. Using the DNN model, the worker processes the data that is offloaded to itself. The data-distributed inference approach (i) has high communication cost especially when the size of data is large, and (ii) is not efficient in terms of memory as the whole model should be stored and computed in each worker. Model-distributed inference is emerging as a promising solution, where a DNN model is distributed across workers. Although there is a huge amount of work on model-distributed training, the benefit of model distribution for inference is not understood well. In this paper, we analyze the potential of model-distributed inference in edge computing systems. Then, we develop an Adaptive and Resilient Model-Distributed Inference (AR-MDI) algorithm based on our optimal model allocation formulation. AR-MDI performs model allocation in a lightweight and decentralized way and it is resilient against delayed workers and failures. We implement AR-MDI in a real testbed consisting of NVIDIA Jetson TX2s and show that AR-MDI improves the inference time significantly as compared to baselines when the size of data is large, such as ImageNet.

Published in IEEE Open Journal of the Communications Society

ISSN: 2644-125X (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication; Social Sciences: Transportation and communications
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8782661

About the journal

Abstract

Keywords