IEEE Access (Jan 2023)
ORSAS: An Output Row-Stationary Accelerator for Sparse Neural Networks
Abstract
Various pruning skills and network compression methods make modern neural networks sparse for both weights and activations. However, GPUs (graphics processing units) and most of the customized CNN (convolutional neural networks) accelerators have not taken advantage of the sparsity of neural networks, and the accelerators for sparse neural networks in these years are suffering from low computational resource utilization. This paper first proposes an output row-stationary dataflow that exploits the sparsity of both weights and activations. It allows the accelerator to process weights and activations in their compressed form, leading to the high utilization of computational resources, i.e. multipliers. Besides, a low-cost compression algorithm is adopted for both weights and input activations to reduce the power consumption of data access. Secondly, the Y-buffer is proposed to eliminate repeated reading of input activations caused by halo effects, which emerge when tiling large-size input feature maps (ifmaps). Third, an interleaved broadcasting mechanism is introduced to alleviate the imbalance problems caused by the irregularity of sparse data. Finally, a prototype design called ORSAS is synthesized and placed and routed with a SMIC 55 nm process. The evaluation results show that ORSAS occupies the smallest logical cell area, and has the highest multiplier utilization among the peers’ works when the sparsity is large. ORSAS keeps the utilization of multipliers in a range from 60% to 90% over the convolutional layers of popular sparse CNNs and achieves ultra-low power consumption and highest efficiency.
Keywords