Binarized Neural Network With Parameterized Weight Clipping and Quantization Gap Minimization for Online Knowledge Distillation

Ju Yeon Kang; Chang Ho Ryu; Tae Hee Han

doi:10.1109/ACCESS.2023.3238715

IEEE Access (Jan 2023)

Binarized Neural Network With Parameterized Weight Clipping and Quantization Gap Minimization for Online Knowledge Distillation

Ju Yeon Kang,
Chang Ho Ryu,
Tae Hee Han

Affiliations

Ju Yeon Kang: ORCiD; Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea
Chang Ho Ryu: Department of Artificial Intelligence, Sungkyunkwan University, Suwon, South Korea
Tae Hee Han: ORCiD; Department of Semiconductor Systems Engineering, Sungkyunkwan University, Suwon, South Korea

DOI: https://doi.org/10.1109/ACCESS.2023.3238715
Journal volume & issue: Vol. 11
pp. 8057 – 8064

Abstract

Read online

As the applications for artificial intelligence are growing rapidly, numerous network compression algorithms have been developed to restrict computing resources such as smartphones, edge, and IoT devices. Knowledge distillation (KD) leverages soft labels derived from a teacher model to a less parameterized model achieving high accuracy with reduced computational burden. Moreover, online KD provides parallel computing through collaborative learning between teacher and student networks, thus enhancing the training speed. A binarized neural network (BNN) offers an intriguing opportunity to facilitate aggressive compression at the expense of drastically degraded accuracy. In this study, two performance improvements are proposed for online KD when a BNN is applied as a student network: 1) parameterized weight clipping (PWC) to reduce dead weights in the student network and 2) quantization gap-aware adaptive temperature scheduling between the teacher and student networks. In contrast to constant weight clipping (CWC), PWC demonstrates a 3.78% top-1 test accuracy enhancement with trainable weight clipping by decreasing the gradient mismatch with CIFAR-10 dataset. Furthermore, the quantization gap-aware temperature scheduling increases the top-1 test accuracy by 0.08% over online KD at a constant temperature. By aggregating both methodologies, the top-1 test accuracy for CIFAR-10 dataset was 94.60%, and that for Tiny-ImageNet dataset was comparable to that of the 32-bit full-precision neural network.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords