Knowledge distillation based on projector integration and classifier sharing

Guanpeng Zuo; Chenlu Zhang; Zhe Zheng; Wu Zhang; Ruiqing Wang; Jingqi Lu; Xiu Jin; Zhaohui Jiang; Yuan Rao

doi:10.1007/s40747-024-01394-3

Complex & Intelligent Systems (Mar 2024)

Knowledge distillation based on projector integration and classifier sharing

Guanpeng Zuo,
Chenlu Zhang,
Zhe Zheng,
Wu Zhang,
Ruiqing Wang,
Jingqi Lu,
Xiu Jin,
Zhaohui Jiang,
Yuan Rao

Affiliations

Guanpeng Zuo: School of Information and Artificial Intelligence, Anhui Agricultural University
Chenlu Zhang: School of Information and Artificial Intelligence, Anhui Agricultural University
Zhe Zheng: School of Information and Artificial Intelligence, Anhui Agricultural University
Wu Zhang: School of Information and Artificial Intelligence, Anhui Agricultural University
Ruiqing Wang: School of Information and Artificial Intelligence, Anhui Agricultural University
Jingqi Lu: School of Information and Artificial Intelligence, Anhui Agricultural University
Xiu Jin: School of Information and Artificial Intelligence, Anhui Agricultural University
Zhaohui Jiang: School of Information and Artificial Intelligence, Anhui Agricultural University
Yuan Rao: School of Information and Artificial Intelligence, Anhui Agricultural University

DOI: https://doi.org/10.1007/s40747-024-01394-3
Journal volume & issue: Vol. 10, no. 3
pp. 4521 – 4533

Abstract

Read online

Abstract Knowledge distillation can transfer the knowledge from the pre-trained teacher model to the student model, thus effectively accomplishing model compression. Previous studies have carefully crafted knowledge representation, targeting loss function design, and distillation location selection, but there have been few studies on the role of classifiers in distillation. Previous experiences have shown that the final classifier of the model has an essential role in making inferences, so this paper attempts to narrow the gap in performance between models by having the student model directly use the classifier of the teacher model for the final inference, which requires an additional projector to help match features of the student encoder with the teacher's classifier. However, a single projector cannot fully align the features, and integrating multiple projectors may result in better performance. Considering the balance between projector size and performance, through experiments, we obtain the size of projectors for different network combinations and propose a simple method for projector integration. In this way, the student model undergoes feature projection and then uses the classifiers of the teacher model for inference, obtaining a similar performance to the teacher model. Through extensive experiments on the CIFAR-100 and Tiny-ImageNet datasets, we show that our approach applies to various teacher–student frameworks simply and effectively.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords