ORSAS: An Output Row-Stationary Accelerator for Sparse Neural Networks

Chenxiao Lin; Yuezhong Liu; Delong Shang

doi:10.1109/ACCESS.2023.3272564

IEEE Access (Jan 2023)

ORSAS: An Output Row-Stationary Accelerator for Sparse Neural Networks

Chenxiao Lin,
Yuezhong Liu,
Delong Shang

Affiliations

Chenxiao Lin: ORCiD; Institute of Microelectronics of the Chinese Academy of Sciences, Beijing, Chaoyang, China
Yuezhong Liu: ORCiD; Institute of Microelectronics of the Chinese Academy of Sciences, Beijing, Chaoyang, China
Delong Shang: Institute of Microelectronics of the Chinese Academy of Sciences, Beijing, Chaoyang, China

DOI: https://doi.org/10.1109/ACCESS.2023.3272564
Journal volume & issue: Vol. 11
pp. 44123 – 44135

Abstract

Read online

Various pruning skills and network compression methods make modern neural networks sparse for both weights and activations. However, GPUs (graphics processing units) and most of the customized CNN (convolutional neural networks) accelerators have not taken advantage of the sparsity of neural networks, and the accelerators for sparse neural networks in these years are suffering from low computational resource utilization. This paper first proposes an output row-stationary dataflow that exploits the sparsity of both weights and activations. It allows the accelerator to process weights and activations in their compressed form, leading to the high utilization of computational resources, i.e. multipliers. Besides, a low-cost compression algorithm is adopted for both weights and input activations to reduce the power consumption of data access. Secondly, the Y-buffer is proposed to eliminate repeated reading of input activations caused by halo effects, which emerge when tiling large-size input feature maps (ifmaps). Third, an interleaved broadcasting mechanism is introduced to alleviate the imbalance problems caused by the irregularity of sparse data. Finally, a prototype design called ORSAS is synthesized and placed and routed with a SMIC 55 nm process. The evaluation results show that ORSAS occupies the smallest logical cell area, and has the highest multiplier utilization among the peers’ works when the sparsity is large. ORSAS keeps the utilization of multipliers in a range from 60% to 90% over the convolutional layers of popular sparse CNNs and achieves ultra-low power consumption and highest efficiency.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords