Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices

Bogil Kim; Sungjae Lee; Amit Ranjan Trivedi; William J. Song

doi:10.1109/ACCESS.2020.3038908

IEEE Access (Jan 2020)

Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices

Bogil Kim,
Sungjae Lee,
Amit Ranjan Trivedi,
William J. Song

Affiliations

Bogil Kim: ORCiD; School of Electrical and Electronic Engineering, Yonsei University, Seoul, South Korea
Sungjae Lee: School of Electrical and Electronic Engineering, Yonsei University, Seoul, South Korea
Amit Ranjan Trivedi: Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, IL, USA
William J. Song: ORCiD; School of Electrical and Electronic Engineering, Yonsei University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2020.3038908
Journal volume & issue: Vol. 8
pp. 216259 – 216270

Abstract

Read online

This paper presents a hardware management technique that enables energy-efficient acceleration of deep neural networks (DNNs) on realtime-constrained embedded edge devices. It becomes increasingly common for edge devices to incorporate dedicated hardware accelerators for neural processing. The execution of neural accelerators in general follows a host-device model, where CPUs offload neural computations (e.g., matrix and vector calculations) to the accelerators for datapath-optimized executions. Such a serialized execution is simple to implement and manage, but it is wasteful for the resource-limited edge devices to exercise only a single type of processing unit in a discrete execution phase. This paper presents a hardware management technique named NeuroPipe that utilizes heterogeneous processing units in an embedded edge device to accelerate DNNs in energy-efficient manner. In particular, NeuroPipe splits a neural network into groups of consecutive layers and pipelines their executions using different types of processing units. The proposed technique offers several advantages to accelerate DNN inference in the embedded edge device. It enables the embedded processor to operate at lower voltage and frequency to enhance energy efficiency while delivering the same performance as uncontrolled baseline executions, or inversely it can dispatch faster inferences at the same energy consumption. Our measurement-driven experiments based on NVIDIA Jetson AGX Xavier with 64 tensor cores and eight-core ARM CPU demonstrate that NeuroPipe reduces energy consumption by 11.4% on average without performance degradation, or it can achieve 30.5% greater performance for the same energy consumption.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords