IEEE Access (Jan 2022)

NeuRes: Highly Activated Neurons Responses Transfer via Distilling Sparse Activation Maps

  • Sharmen Akhter,
  • Md Imtiaz Hossain,
  • Md Delowar Hossain,
  • Eui-Nam Huh

DOI
https://doi.org/10.1109/ACCESS.2022.3227804
Journal volume & issue
Vol. 10
pp. 131555 – 131566

Abstract

Read online

In recent years, Knowledge Distillation has obtained a significant interest in mobile, edge, and IoT devices due to its ability to transfer knowledge from the large and complex teacher to the lightweight student network. Intuitively, Knowledge Distillation refers to forcing the student to mimic the teacher’s neuron responses to improve the generalization of the student by deploying the distillation losses as the regularization terms. However, the non-linearity of the hidden layers and the high dimensionality of the feature maps make the knowledge transfer a rigorous task. Though numerous methods have been proposed to transfer the teacher’s neuron responses in the form of diverse feature characteristics such as attention, contrastive representation, and so on, to the best of our knowledge, no prior works considered feature-level non-linearity during distillation. In this work, we ask, does feature-level non-linearity-based approaches can improve student performance? For investigating those concerns, we propose a novel knowledge distillation technique called the NeuRes (Neuron’s Responses) via distilling the Sparse Activation Maps (SAMs) to transfer the highly activated Neurons Responses to the student to enhance the representation capability. Proposed NeuRes selects the highly activated neuron responses that produce Sparse Activation Maps (SAMs) while transferring the knowledge based on activation normalization. Our proposed NeuRes also transfers the translation invariant features using auxiliary classifiers and augmented data to improve students’ generalization. The detailed ablation studies and extensive experiments on model compression, transferability, adversarial robustness, and few-shot learning verify that NeuRes outperforms state-of-the-art distillation techniques on the standard benchmark datasets.

Keywords