Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge

Darian Onchis; Codruta Istin; Ioan Samuila

doi:10.3390/make6030094

Machine Learning and Knowledge Extraction (Aug 2024)

Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge

Darian Onchis,
Codruta Istin,
Ioan Samuila

Affiliations

Darian Onchis: Department of Computer Science, West University of Timisoara, 300223 Timisoara, Romania
Codruta Istin: Department of Computer and Information Technology, Politehnica University of Timisoara, 300006 Timisoara, Romania
Ioan Samuila: Department of Computer Science, West University of Timisoara, 300223 Timisoara, Romania

DOI: https://doi.org/10.3390/make6030094
Journal volume & issue: Vol. 6, no. 3
pp. 1921 – 1935

Abstract

Read online

In this paper, a method is introduced to control the dark knowledge values also known as soft targets, with the purpose of improving the training by knowledge distillation for multi-class classification tasks. Knowledge distillation effectively transfers knowledge from a larger model to a smaller model to achieve efficient, fast, and generalizable performance while retaining much of the original accuracy. The majority of deep neural models used for classification tasks append a SoftMax layer to generate output probabilities and it is usual to take the highest score and consider it the inference of the model, while the rest of the probability values are generally ignored. The focus is on those probabilities as carriers of dark knowledge and our aim is to quantify the relevance of dark knowledge, not heuristically as provided in the literature so far, but with an inductive proof on the SoftMax operational limits. These limits are further pushed by using an incremental decision tree with information gain split. The user can set a desired precision and an accuracy level to obtain a maximal temperature setting for a continual classification process. Moreover, by fitting both the hard targets and the soft targets, one obtains an optimal knowledge distillation effect that mitigates better catastrophic forgetting. The strengths of our method come from the possibility of controlling the amount of distillation transferred non-heuristically and the agnostic application of this model-independent study.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords