Deep Neural Network Acceleration With Sparse Prediction Layers

Zhongtian Yao; Kejie Huang; Haibin Shen; Zhaoyan Ming

doi:10.1109/ACCESS.2020.2963941

IEEE Access (Jan 2020)

Deep Neural Network Acceleration With Sparse Prediction Layers

Zhongtian Yao,
Kejie Huang,
Haibin Shen,
Zhaoyan Ming

Affiliations

Zhongtian Yao: ORCiD; College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China
Kejie Huang: ORCiD; College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China
Haibin Shen: ORCiD; College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China
Zhaoyan Ming: ORCiD; College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China

DOI: https://doi.org/10.1109/ACCESS.2020.2963941
Journal volume & issue: Vol. 8
pp. 6839 – 6848

Abstract

Read online

The ever-increasing computation cost of Convolutional Neural Network (CNN) makes it imperative for real-world applications to accelerate the key steps especially the inference. In this work, we propose an efficient yet general scheme called Sparse Prediction Layer (SPL) which can predict and skip the trivial elements in the CNN layer. Pruned weights are used to predict the locations of maximum values in max-pooling kernels and those of positive values before Rectified Linear Units (ReLUs). Thereafter, the precise values of these predicted important elements are calculated selectively and the complete outputs are restored from them. Our experiments on ImageNet Large Scale Visual Recognition Competition (ILSVRC) 2012 show that SPL can reduce 68.3%, 58.6% and 59.5% Floating-point Operations (FLOPs) on AlexNet, VGG-16 and ResNet-50, respectively, within an accuracy loss of less than 1% without retraining. The proposed SPL scheme can further accelerate these networks pruned by other pruning-based methods, such as a FLOP reduction of 50.2% on the ResNet-50 which has been pruned by Channel Pruning (CP) before being applied with SPLs. A special matrix multiplication called Sparse Result Matrix Multiplication (SRMM) is proposed to support the implementation of SPL, and its acceleration effect is in line with expectations.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords