IEEE Access (Jan 2024)
Heterogeneous Knowledge Distillation Using Conceptual Learning
Abstract
Recent advances in deep learning have led to the development of large, high-performing models that have been pretrained on massive datasets. However, employing these models in real-world services requires fast inference speed and low computational complexity. This has driven an interest in model compression techniques, such as knowledge distillation, which transfers the knowledge learned by a teacher model to a smaller student model. However, traditional knowledge distillation models are limited in that the student learns from the teacher model only the knowledge needed to solve the given problem. Therefore, giving an appropriate answer to a case that has not been encountered yet is difficult. In this study, we propose a heterogeneous knowledge distillation method that distills knowledge through a teacher model that has obtained knowledge about higher concepts, not the knowledge that needs to be obtained. The proposed methodology is based on the pedagogical discovery that problems can be solved better by learning not only specific knowledge about the problem but also general knowledge of higher concepts. In particular, beyond the limitations where traditional knowledge distillation was only capable of transferring knowledge for the same task, one can anticipate performance enhancement in lightweight models and extended applicability of pre-trained teacher models through the transfer of heterogeneous knowledge using the proposed methodology. In addition, through classification experiments on 70,000 images from the machine learning benchmark dataset Fashion-MNIST, we confirmed that the proposed heterogeneous knowledge distillation methodology achieves superior performance in terms of classification accuracy than does traditional knowledge distillation.
Keywords