A survey on knowledge distillation: Recent advancements

Amir Moslemi; Anna Briskina; Zubeka Dang; Jason Li

Machine Learning with Applications (Dec 2024)

A survey on knowledge distillation: Recent advancements

Amir Moslemi,
Anna Briskina,
Zubeka Dang,
Jason Li

Affiliations

Amir Moslemi: School of Software Design & Data Science, Seneca Polytechnic, Toronto, Ontario, Canada; Department of Physics, Toronto Metropolitan University, Toronto, Ontario, Canada; Corresponding author.
Anna Briskina: School of Software Design & Data Science, Seneca Polytechnic, Toronto, Ontario, Canada
Zubeka Dang: School of Software Design & Data Science, Seneca Polytechnic, Toronto, Ontario, Canada
Jason Li: School of Software Design & Data Science, Seneca Polytechnic, Toronto, Ontario, Canada

Journal volume & issue: Vol. 18
p. 100605

Abstract

Read online

Deep learning has achieved notable success across academia, medicine, and industry. Its ability to identify complex patterns in large-scale data and to manage millions of parameters has made it highly advantageous. However, deploying deep learning models presents a significant challenge due to their high computational demands. Knowledge distillation (KD) has emerged as a key technique for model compression and efficient knowledge transfer, enabling the deployment of deep learning models on resource-limited devices without compromising performance. This survey examines recent advancements in KD, highlighting key innovations in architectures, training paradigms, and application domains. We categorize contemporary KD methods into traditional approaches, such as response-based, feature-based, and relation-based knowledge distillation, and novel advanced paradigms, including self-distillation, cross-modal distillation, and adversarial distillation strategies. Additionally, we discuss emerging challenges, particularly in the context of distillation under limited data scenarios, privacy-preserving KD, and the interplay with other model compression techniques like quantization. Our survey also explores applications across computer vision, natural language processing, and multimodal tasks, where KD has driven performance improvements and enhanced model compression. This review aims to provide researchers and practitioners with a comprehensive understanding of the state-of-the-art in knowledge distillation, bridging foundational concepts with the latest methodologies and practical implications.

Published in Machine Learning with Applications

ISSN: 2666-8270 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/machine-learning-with-applications

About the journal

Abstract

Keywords