Algorithms (Jul 2022)

Inference Acceleration with Adaptive Distributed DNN Partition over Dynamic Video Stream

  • Jin Cao,
  • Bo Li,
  • Mengni Fan,
  • Huiyu Liu

DOI
https://doi.org/10.3390/a15070244
Journal volume & issue
Vol. 15, no. 7
p. 244

Abstract

Read online

Deep neural network-based computer vision applications have exploded and are widely used in intelligent services for IoT devices. Due to the computationally intensive nature of DNNs, the deployment and execution of intelligent applications in smart scenarios face the challenge of limited device resources. Existing job scheduling strategies are single-focused and have limited support for large-scale end-device scenarios. In this paper, we present ADDP, an adaptive distributed DNN partition method that supports video analysis on large-scale smart cameras. ADDP applies to the commonly used DNN models for computer vision and contains a feature-map layer partition module (FLP) supporting edge-to-end collaborative model partition and a feature-map size partition (FSP) module supporting multidevice parallel inference. Based on the inference delay minimization objective, FLP and FSP achieve a tradeoff between the arithmetic and communication resources of different devices. We validate ADDP on heterogeneous devices and show that both the FLP module and the FSP module outperform existing approaches and reduce single-frame response latency by 10–25% compared to the pure on-device processing.

Keywords