Progressive multi-level distillation learning for pruning network

Ruiqing Wang; Shengmin Wan; Wu Zhang; Chenlu Zhang; Yu Li; Shaoxiang Xu; Lifu Zhang; Xiu Jin; Zhaohui Jiang; Yuan Rao

doi:10.1007/s40747-023-01036-0

Complex & Intelligent Systems (Apr 2023)

Progressive multi-level distillation learning for pruning network

Ruiqing Wang,
Shengmin Wan,
Wu Zhang,
Chenlu Zhang,
Yu Li,
Shaoxiang Xu,
Lifu Zhang,
Xiu Jin,
Zhaohui Jiang,
Yuan Rao

Affiliations

Ruiqing Wang: School of Information and Computer, Anhui Agricultural University
Shengmin Wan: School of Information and Computer, Anhui Agricultural University
Wu Zhang: School of Information and Computer, Anhui Agricultural University
Chenlu Zhang: School of Information and Computer, Anhui Agricultural University
Yu Li: School of Information and Computer, Anhui Agricultural University
Shaoxiang Xu: School of Information and Computer, Anhui Agricultural University
Lifu Zhang: School of Information and Computer, Anhui Agricultural University
Xiu Jin: School of Information and Computer, Anhui Agricultural University
Zhaohui Jiang: School of Information and Computer, Anhui Agricultural University
Yuan Rao: School of Information and Computer, Anhui Agricultural University

DOI: https://doi.org/10.1007/s40747-023-01036-0
Journal volume & issue: Vol. 9, no. 5
pp. 5779 – 5791

Abstract

Read online

Abstract Although the classification method based on the deep neural network has achieved excellent results in classification tasks, it is difficult to apply to real-time scenarios because of high memory footprints and prohibitive inference times. Compared to unstructured pruning, structured pruning techniques can reduce the computation cost of the model runtime more effectively, but inevitably reduces the precision of the model. Traditional methods use fine tuning to restore model damage performance. However, there is still a large gap between the pruned model and the original one. In this paper, we use progressive multi-level distillation learning to compensate for the loss caused by pruning. Pre-pruning and post-pruning networks serve as the teacher and student networks. The proposed approach utilizes the complementary properties of structured pruning and knowledge distillation, which allows the pruned network to learn the intermediate and output representations of the teacher network, thus reducing the influence of the model subject to pruning. Experiments demonstrate that our approach performs better on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets with different pruning rates. For instance, GoogLeNet can achieve near lossless pruning on the CIFAR-10 dataset with 60% pruning. Moreover, this paper also proves that using the proposed distillation learning method during the pruning process achieves more significant performance gains than after completing the pruning.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords