Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP

Wontae Kim; Sangheon Lee; Ilwi Yun; Chulhee Lee; Kyujoong Lee; Hyuk-Jae Lee

doi:10.1109/ACCESS.2022.3197206

IEEE Access (Jan 2022)

Energy-Efficient Dataflow Scheduling of CNN Applications for Vector-SIMD DSP

Wontae Kim,
Sangheon Lee,
Ilwi Yun,
Chulhee Lee,
Kyujoong Lee,
Hyuk-Jae Lee

Affiliations

Wontae Kim: Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
Sangheon Lee: Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
Ilwi Yun: Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
Chulhee Lee: System LSI Division, Samsung Electronics Corporation, Hwaseong, South Korea
Kyujoong Lee: ORCiD; School of AI Convergence, Sungshin Women’s University, Seoul, South Korea
Hyuk-Jae Lee: ORCiD; Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2022.3197206
Journal volume & issue: Vol. 10
pp. 86234 – 86247

Abstract

Read online

Dataflow-scheduling techniques for convolutional neural networks (CNNs) are extensively studied to minimize the off-chip memory access. However, the efficiencies of the previously proposed techniques are limited because their optimizations only consider the general hardware such as FPGA and GPU. To overcome this limitation, this paper proposes dataflow scheduling for vector-SIMD DSP to minimize the energy consumption for the off-chip memory access. First, the proposed technique attempts to group as many given layers as possible. For grouping the layers, the tiles in different layers are executed in sequence without the off-chip memory access except the first and the last layers in the group. The length of the grouped layers is determined with regard to the minimization of the energy consumption of off-chip memory by estimating the proposed energy model of the off-chip memory. However, grouping the layers results in the additional computation. To minimize this overhead, this paper solves the optimization problem for in the grouped layers. Second, for layers that cannot be grouped, the tiling along the W-axis is not considered, to maximize the size of the overlapped data in consecutive tiles. Consequently, the reuse of the overlapped data in the on-chip buffer is maximized, thereby reducing the energy consumption by the off-chip memory. For evaluation, a cycle-accurate simulation environment is established to measure the energy consumption of the off-chip memory by tracing the data between a vector-SIMD DSP and an off-chip memory. The experimental results show that compared with the baseline tiling and scheduling techniques, the proposed technique reduces the energy consumption by an average of 51% for CNN applications such as Tiny YOLOv2, MobileNetv1, VDSR.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords