Sparse-PE: A Performance-Efficient Processing Engine Core for Sparse Convolutional Neural Networks

Mahmood Azhar Qureshi; Arslan Munir

doi:10.1109/ACCESS.2021.3126708

IEEE Access (Jan 2021)

Sparse-PE: A Performance-Efficient Processing Engine Core for Sparse Convolutional Neural Networks

Mahmood Azhar Qureshi,
Arslan Munir

Affiliations

Mahmood Azhar Qureshi: ORCiD; Department of Computer Science, Kansas State University, Manhattan, KS, USA
Arslan Munir: ORCiD; Department of Computer Science, Kansas State University, Manhattan, KS, USA

DOI: https://doi.org/10.1109/ACCESS.2021.3126708
Journal volume & issue: Vol. 9
pp. 151458 – 151475

Abstract

Read online

Sparse convolutional neural network (CNN) models reduce the massive compute and memory bandwidth requirements inherently present in dense CNNs without a significant loss in accuracy. Sparse CNNs, however, present their own set of challenges including non-linear data accesses and complex design of CNN processing elements (PEs). Recently proposed accelerators like SCNN, Eyeriss v2, and SparTen, exploit the two-sided sparsity, that is, sparsity in both the input activations and weights to accelerate the CNN inference. These, accelerators, however, suffer from a multitude of problems that limit their applicability, such as inefficient micro-architecture (SCNN, Eyeriss v2), complex PE design (Eyeriss v2), no support for non-unit stride convolutions (SCNN) and FC layers (SparTen, SCNN). To address these issues in contemporary sparse CNN accelerators, we propose Sparse-PE, a multi-threaded, and flexible CNN PE, capable of handling both the dense and sparse CNNs. The Sparse-PE core uses binary mask representation and actively skips computations involving zeros and favors non-zero computations, thereby, drastically increasing the effective throughput and hardware utilization. Unlike previous designs, the Sparse-PE core is generic in nature and not targeted towards a specific accelerator, and thus, can also be used as a standalone sparse dot product compute engine. We evaluate the performance of the core using a custom built cycle accurate simulator. Our simulations show that the Sparse-PE core-based accelerator provides a performance gain of $12\times $ over a recently proposed dense accelerator (NeuroMAX). For sparse accelerators, it provides a performance gain of $4.2\times $ , $2.38\times $ , and $1.98\times $ over SCNN, Eyeriss v2, and SparTen, respectively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords