Block-Wisely Supervised Network Pruning with Knowledge Distillation and Markov Chain Monte Carlo

Huidong Liu; Fang Du; Lijuan Song; Zhenhua Yu

doi:10.3390/app122110952

Applied Sciences (Oct 2022)

Block-Wisely Supervised Network Pruning with Knowledge Distillation and Markov Chain Monte Carlo

Huidong Liu,
Fang Du,
Lijuan Song,
Zhenhua Yu

Affiliations

Huidong Liu: School of Information Engineering, Ningxia University, Yinchuan 750021, China
Fang Du: School of Information Engineering, Ningxia University, Yinchuan 750021, China
Lijuan Song: School of Information Engineering, Ningxia University, Yinchuan 750021, China
Zhenhua Yu: School of Information Engineering, Ningxia University, Yinchuan 750021, China

DOI: https://doi.org/10.3390/app122110952
Journal volume & issue: Vol. 12, no. 21
p. 10952

Abstract

Read online

Structural network pruning is an effective way to reduce network size for deploying deep networks to resource-constrained devices. Existing methods mainly employ knowledge distillation from the last layer of network to guide pruning of the whole network, and informative features from intermediate layers are not yet fully exploited to improve pruning efficiency and accuracy. In this paper, we propose a block-wisely supervised network pruning (BNP) approach to find the optimal subnet from a baseline network based on knowledge distillation and Markov Chain Monte Carlo. To achieve this, the baseline network is divided into small blocks, and block shrinkage can be independently applied to each block under a same manner. Specifically, block-wise representations of the baseline network are exploited to supervise subnet search by encouraging each block of student network to imitate the behavior of the corresponding baseline block. A score metric measuring block accuracy and efficiency is assigned to each block, and block search is conducted under a Markov Chain Monte Carlo scheme to sample blocks from the posterior. Knowledge distillation enables effective feature representations of the student network, and Markov Chain Monte Carlo provides a sampling scheme to find the optimal solution. Extensive evaluations on multiple network architectures and datasets show BNP outperforms the state of the art. For instance, with 0.16% accuracy improvement on the CIFAR-10 dataset, it yields a more compact subnet of ResNet-110 than other methods by reducing 61.24% FLOPs.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords