Dianxin kexue (Sep 2024)
Swin Transformer lightweight: an efficient strategy that combines weight sharing, distillation and pruning
Abstract
Swin Transformer, as a layered visual transformer with shifted windows, has attracted extensive attention in the field of computer vision due to its exceptional modeling capabilities. However, its high computational complexity limits its applicability on devices with constrained computational resources. To address this issue, a pruning compression method was proposed, integrating weight sharing and distillation. Initially, weight sharing was implemented across layers, and transformation layers were added to introduce weight transformation, thereby enhancing diversity. Subsequently, a parameter dependency mapping graph for the transformation blocks was constructed and analyzed, and a grouping matrix F was built to record the dependency relationships among all parameters and identify parameters for simultaneous pruning. Finally, distillation was then employed to restore the model’s performance. Experiments conducted on the ImageNet-Tiny-200 public dataset demonstrate that, with a reduction of 32% in model computational complexity, the proposed method only results in approximately a 3% performance degradation at minimum. It provides a solution for deploying high-performance artificial intelligence models in environments with limited computational resources.