A Pipelining-Based Heterogeneous Scheduling and Energy-Throughput Optimization Scheme for CNNs Leveraging Apache TVM

Delia Velasco-Montero; Bart Goossens; Jorge Fernandez-Berni; Angel Rodriguez-Vazquez; Wilfried Philips

doi:10.1109/ACCESS.2023.3264828

IEEE Access (Jan 2023)

A Pipelining-Based Heterogeneous Scheduling and Energy-Throughput Optimization Scheme for CNNs Leveraging Apache TVM

Delia Velasco-Montero,
Bart Goossens,
Jorge Fernandez-Berni,
Angel Rodriguez-Vazquez,
Wilfried Philips

Affiliations

Delia Velasco-Montero: ORCiD; Instituto de Microelectrónica de Sevilla, Universidad de Sevilla-CSIC, Seville, Spain
Bart Goossens: ORCiD; Image Processing and Interpretation Research Group, imec-Ghent University, Ghent, Belgium
Jorge Fernandez-Berni: ORCiD; Instituto de Microelectrónica de Sevilla, Universidad de Sevilla-CSIC, Seville, Spain
Angel Rodriguez-Vazquez: ORCiD; Instituto de Microelectrónica de Sevilla, Universidad de Sevilla-CSIC, Seville, Spain
Wilfried Philips: Image Processing and Interpretation Research Group, imec-Ghent University, Ghent, Belgium

DOI: https://doi.org/10.1109/ACCESS.2023.3264828
Journal volume & issue: Vol. 11
pp. 35007 – 35021

Abstract

Read online

Extracting information of interest from continuous video streams is a strongly demanded computer vision task. For the realization of this task at the edge using the current de-facto standard approach, i.e., deep learning, it is critical to optimize key performance metrics such as throughput and energy consumption according to prescribed application requirements. This allows achieving timely decision-making while extending the battery lifetime as much as possible. In this context, we propose a method to boost neural-network performance based on a co-execution strategy that exploits hardware heterogeneity on edge platforms. The enabling tool is Apache TVM, a highly efficient machine-learning compiler compatible with a diversity of hardware back-ends. The proposed approach solves the problem of network partitioning and distributes the workloads to make concurrent use of all the processors available on the board following a pipeline scheme. We conducted experiments on various popular CNNs compiled with TVM on the Jetson TX2 platform. The experimental results based on measurements show a significant improvement in throughput with respect to a single-processor execution, ranging from 14% to 150% over all tested networks. Power-efficient configurations were also identified, accomplishing energy reductions above 10%.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords