Machine Learning with Applications (Dec 2024)
A survey on knowledge distillation: Recent advancements
Abstract
Deep learning has achieved notable success across academia, medicine, and industry. Its ability to identify complex patterns in large-scale data and to manage millions of parameters has made it highly advantageous. However, deploying deep learning models presents a significant challenge due to their high computational demands. Knowledge distillation (KD) has emerged as a key technique for model compression and efficient knowledge transfer, enabling the deployment of deep learning models on resource-limited devices without compromising performance. This survey examines recent advancements in KD, highlighting key innovations in architectures, training paradigms, and application domains. We categorize contemporary KD methods into traditional approaches, such as response-based, feature-based, and relation-based knowledge distillation, and novel advanced paradigms, including self-distillation, cross-modal distillation, and adversarial distillation strategies. Additionally, we discuss emerging challenges, particularly in the context of distillation under limited data scenarios, privacy-preserving KD, and the interplay with other model compression techniques like quantization. Our survey also explores applications across computer vision, natural language processing, and multimodal tasks, where KD has driven performance improvements and enhanced model compression. This review aims to provide researchers and practitioners with a comprehensive understanding of the state-of-the-art in knowledge distillation, bridging foundational concepts with the latest methodologies and practical implications.