IEEE Access (Jan 2023)

Binarized Neural Network With Parameterized Weight Clipping and Quantization Gap Minimization for Online Knowledge Distillation

  • Ju Yeon Kang,
  • Chang Ho Ryu,
  • Tae Hee Han

DOI
https://doi.org/10.1109/ACCESS.2023.3238715
Journal volume & issue
Vol. 11
pp. 8057 – 8064

Abstract

Read online

As the applications for artificial intelligence are growing rapidly, numerous network compression algorithms have been developed to restrict computing resources such as smartphones, edge, and IoT devices. Knowledge distillation (KD) leverages soft labels derived from a teacher model to a less parameterized model achieving high accuracy with reduced computational burden. Moreover, online KD provides parallel computing through collaborative learning between teacher and student networks, thus enhancing the training speed. A binarized neural network (BNN) offers an intriguing opportunity to facilitate aggressive compression at the expense of drastically degraded accuracy. In this study, two performance improvements are proposed for online KD when a BNN is applied as a student network: 1) parameterized weight clipping (PWC) to reduce dead weights in the student network and 2) quantization gap-aware adaptive temperature scheduling between the teacher and student networks. In contrast to constant weight clipping (CWC), PWC demonstrates a 3.78% top-1 test accuracy enhancement with trainable weight clipping by decreasing the gradient mismatch with CIFAR-10 dataset. Furthermore, the quantization gap-aware temperature scheduling increases the top-1 test accuracy by 0.08% over online KD at a constant temperature. By aggregating both methodologies, the top-1 test accuracy for CIFAR-10 dataset was 94.60%, and that for Tiny-ImageNet dataset was comparable to that of the 32-bit full-precision neural network.

Keywords