Information (Nov 2024)

LGFA-MTKD: Enhancing Multi-Teacher Knowledge Distillation with Local and Global Frequency Attention

  • Xin Cheng,
  • Jinjia Zhou

DOI
https://doi.org/10.3390/info15110735
Journal volume & issue
Vol. 15, no. 11
p. 735

Abstract

Read online

Transferring the extensive and varied knowledge contained within multiple complex models into a more compact student model poses significant challenges in multi-teacher knowledge distillation. Traditional distillation approaches often fall short in this context, as they struggle to fully capture and integrate the wide range of valuable information from each teacher. The variation in the knowledge offered by various teacher models complicates the student model’s ability to learn effectively and generalize well, ultimately resulting in subpar results. To overcome these constraints, We introduce an innovative method that integrates both localized and globalized frequency attention techniques, aiming to substantially enhance the distillation process. By simultaneously focusing on fine-grained local details and broad global patterns, our approach allows the student model to more effectively grasp the complex and diverse information provided by each teacher, therefore enhancing its learning capability. This dual-attention mechanism allows for a more balanced assimilation of specific details and generalized concepts, resulting in a more robust and accurate student model. Extensive experimental evaluations on standard benchmarks demonstrate that our methodology reliably exceeds the performance of current multi-teacher distillation methods, yielding outstanding outcomes regarding both performance and robustness. Specifically, our approach achieves an average performance improvement of 0.55% over CA-MKD, with a 1.05% gain in optimal conditions. These findings suggest that frequency-based attention mechanisms can unlock new potential in knowledge distillation, model compression, and transfer learning.

Keywords