IEEE Access (Jan 2024)
Bridging the Knowledge Gap via Transformer-Based Multi-Layer Correlation Learning
Abstract
We tackle a multi-layer knowledge distillation problem between deep models with heterogeneous architectures. The main challenges of that are the mismatches of the feature maps in terms of the resolution or semantic levels. To resolve this, we propose a novel transformer-based multi-layer correlation knowledge distillation (TMC-KD) method in order to bridge the knowledge gap between a pair of networks. Our method aims to narrow the relational knowledge gaps between teacher and student models by minimizing the local and global feature correlations. Based on extensive comparisons with the recent KD methods on classification and detection tasks, we prove the effectiveness and usefulness of our TMC-KD method.
Keywords