ODMDEF: On-Device Multi-DNN Execution Framework Utilizing Adaptive Layer-Allocation on General Purpose Cores and Accelerators

Cheolsun Lim; Myungsun Kim

doi:10.1109/ACCESS.2021.3088861

IEEE Access (Jan 2021)

ODMDEF: On-Device Multi-DNN Execution Framework Utilizing Adaptive Layer-Allocation on General Purpose Cores and Accelerators

Cheolsun Lim,
Myungsun Kim

Affiliations

Cheolsun Lim: ORCiD; Department of IT Convergence Engineering, Hansung University, Seoul, South Korea
Myungsun Kim: ORCiD; Department of IT Convergence Engineering, Hansung University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2021.3088861
Journal volume & issue: Vol. 9
pp. 85403 – 85417

Abstract

Read online

On-device DNN processing has been common interests in the field of autonomous driving research. For better accuracy, both the number of DNN models and the model-complexity have been increased. To properly respond to this, hardware platforms structured with multicore-based CPUs and DNN accelerators have been released, and the GPU is generally used as an accelerator. When multiple DNN workloads are sporadically requested, the GPU can be easily oversubscribed, thereby leading to an unexpected performance bottleneck. We propose an on-device CPU-GPU co-scheduling framework for multi-DNN execution to remove the performance barrier precluding DNN executions from being bounded by the GPU. Our framework fills up the unused CPU cycles with DNN computations to ease the computational burden of the GPU. To provide seamless computing environment for the two different core types, the framework formats each layer execution according to the computational methods supported by CPU and GPU cores. To cope with irregular arrivals of DNN workloads, and to accommodate their fluctuating demands for hardware resources, our framework dynamically selects the best fit core type after making a comparative judgement between the current availabilities of the two core types. During the core selection time, offline-trained prediction models are utilized to get precisely predicted execution time of the issued layer. Our framework mitigates the fact that even the same DNN models can have large performance deviations due to the nature of the process scheduler of the underlying OS which is GPU-agnostic. In addition, the framework minimizes the memory copy overhead inevitably occurring in the data synchronization phase between the heterogeneous cores. To do so, we further analyze GPU-to-CPU and CPU-to-GPU data transfer cases separately, and then apply the solution that best suits each case. For multi-DNN inference jobs with the NVIDIA Jetson AGX Xavier platform, our framework speeds up the execution time by up to 46.6% over the GPU-only solution.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords