IEEE Access (Jan 2020)

Deep Neural Network Acceleration With Sparse Prediction Layers

  • Zhongtian Yao,
  • Kejie Huang,
  • Haibin Shen,
  • Zhaoyan Ming

DOI
https://doi.org/10.1109/ACCESS.2020.2963941
Journal volume & issue
Vol. 8
pp. 6839 – 6848

Abstract

Read online

The ever-increasing computation cost of Convolutional Neural Network (CNN) makes it imperative for real-world applications to accelerate the key steps especially the inference. In this work, we propose an efficient yet general scheme called Sparse Prediction Layer (SPL) which can predict and skip the trivial elements in the CNN layer. Pruned weights are used to predict the locations of maximum values in max-pooling kernels and those of positive values before Rectified Linear Units (ReLUs). Thereafter, the precise values of these predicted important elements are calculated selectively and the complete outputs are restored from them. Our experiments on ImageNet Large Scale Visual Recognition Competition (ILSVRC) 2012 show that SPL can reduce 68.3%, 58.6% and 59.5% Floating-point Operations (FLOPs) on AlexNet, VGG-16 and ResNet-50, respectively, within an accuracy loss of less than 1% without retraining. The proposed SPL scheme can further accelerate these networks pruned by other pruning-based methods, such as a FLOP reduction of 50.2% on the ResNet-50 which has been pruned by Channel Pruning (CP) before being applied with SPLs. A special matrix multiplication called Sparse Result Matrix Multiplication (SRMM) is proposed to support the implementation of SPL, and its acceleration effect is in line with expectations.

Keywords