Sensors (Jun 2024)

NTCE-KD: Non-Target-Class-Enhanced Knowledge Distillation

  • Chuan Li,
  • Xiao Teng,
  • Yan Ding,
  • Long Lan

DOI
https://doi.org/10.3390/s24113617
Journal volume & issue
Vol. 24, no. 11
p. 3617

Abstract

Read online

Most logit-based knowledge distillation methods transfer soft labels from the teacher model to the student model via Kullback–Leibler divergence based on softmax, an exponential normalization function. However, this exponential nature of softmax tends to prioritize the largest class (target class) while neglecting smaller ones (non-target classes), leading to an oversight of the non-target classes’s significance. To address this issue, we propose Non-Target-Class-Enhanced Knowledge Distillation (NTCE-KD) to amplify the role of non-target classes both in terms of magnitude and diversity. Specifically, we present a magnitude-enhanced Kullback–Leibler (MKL) divergence multi-shrinking the target class to enhance the impact of non-target classes in terms of magnitude. Additionally, to enrich the diversity of non-target classes, we introduce a diversity-based data augmentation strategy (DDA), further enhancing overall performance. Extensive experimental results on the CIFAR-100 and ImageNet-1k datasets demonstrate that non-target classes are of great significance and that our method achieves state-of-the-art performance across a wide range of teacher–student pairs.

Keywords