DNN Partitioning for Inference Throughput Acceleration at the Edge

Thomas Feltin; Leo Marcho; Juan-Antonio Cordero-Fuertes; Frank Brockners; Thomas H. Clausen

doi:10.1109/ACCESS.2023.3244497

IEEE Access (Jan 2023)

DNN Partitioning for Inference Throughput Acceleration at the Edge

Thomas Feltin,
Leo Marcho,
Juan-Antonio Cordero-Fuertes,
Frank Brockners,
Thomas H. Clausen

Affiliations

Thomas Feltin: ORCiD; École Polytechnique, Palaiseau, France
Leo Marcho: ORCiD; Cisco Systems, San Jose, CA, USA
Juan-Antonio Cordero-Fuertes: ORCiD; École Polytechnique, Palaiseau, France
Frank Brockners: Cisco Systems, San Jose, CA, USA
Thomas H. Clausen: ORCiD; École Polytechnique, Palaiseau, France

DOI: https://doi.org/10.1109/ACCESS.2023.3244497
Journal volume & issue: Vol. 11
pp. 52236 – 52249

Abstract

Read online

Deep neural network (DNN) inference on streaming data requires computing resources to satisfy inference throughput requirements. However, latency and privacy sensitive deep learning applications cannot afford to offload computation to remote clouds because of the implied transmission cost and lack of trust in third-party cloud providers. Among solutions to increase performance while keeping computation on a constrained environment, hardware acceleration can be onerous, and model optimization requires extensive design efforts while hindering accuracy. DNN partitioning is a third complementary approach, and consists of distributing the inference workload over several available edge devices, taking into account the edge network properties and the DNN structure, with the objective of maximizing the inference throughput (number of inferences per second). This paper introduces a method to predict inference and transmission latencies for multi-threaded distributed DNN deployments, and defines an optimization process to maximize the inference throughput. A branch and bound solver is then presented and analyzed to quantify the achieved performance and complexity. This analysis has led to the definition of the acceleration region, which describes deterministic conditions on the DNN and network properties under which DNN partitioning is beneficial. Finally, experimental results confirm the simulations and show inference throughput improvements in sample edge deployments.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords