Roofline-Model-Based Design Space Exploration for Dataflow Techniques of CNN Accelerators

Chan Park; Sungkyung Park; Chester Sungchung Park

doi:10.1109/ACCESS.2020.3025550

IEEE Access (Jan 2020)

Roofline-Model-Based Design Space Exploration for Dataflow Techniques of CNN Accelerators

Chan Park,
Sungkyung Park,
Chester Sungchung Park

Affiliations

Chan Park: ORCiD; Department of Electronics Engineering, Pusan National University, Pusan, South Korea
Sungkyung Park: ORCiD; Department of Electronics Engineering, Pusan National University, Pusan, South Korea
Chester Sungchung Park: ORCiD; Department of Electrical Engineering, Konkuk University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2020.3025550
Journal volume & issue: Vol. 8
pp. 172509 – 172523

Abstract

Read online

To effectively compute convolutional layers, a complex design space must exist (e.g., the dataflow techniques associated with the layer parameters, loop transformation techniques, and hardware parameters). For efficient design space exploration (DSE) of various dataflow techniques, namely, the weight-stationary (WS), output-stationary (OS), row-stationary (RS), and no local reuse (NLR) techniques, the processing element (PE) structure and computational pattern of each dataflow technique are analyzed. Various performance metrics are calculated, namely, the throughput (in giga-operations per second, GOPS), computation-to-communication ratio (CCR), on-chip memory usage, and off-chip memory bandwidth, as closed-form expressions of the layer and hardware parameters. In addition, loop interchange and loop unrolling techniques with a double-buffer architecture are assumed. Many roofline model-based simulations are performed to explore relevant dataflow techniques for a wide variety of convolutional layers of typical neural networks. Through simulation, this paper provides insights into the trends in accelerator performance as the layer parameters change. For convolutional layers with large input and output feature map (ifmap and ofmap) widths and heights, the GOPS of the NLR dataflow technique tends to be higher than that of the techniques. For convolutional layers with low weight and ofmap widths and heights, the RS dataflow technique achieves optimal GOPS and on-chip memory usage. In the case of convolutional layers with small weight widths and heights, the GOPS of the WS dataflow technique tends to be high. In the case of convolutional layers with small ofmap widths and heights, the OS dataflow technique achieves optimal GOPS and on-chip memory usage.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords