IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2025)
Enhancing Remote Sensing Semantic Segmentation Accuracy and Efficiency Through Transformer and Knowledge Distillation
Abstract
In semantic segmentation tasks, the transition from convolutional neural networks (CNNs) to transformers is driven by the latter's superior ability to capture global semantic information in remote sensing images. However, most transformer methods face challenges such as slow inference speed and limitations in capturing local features. To address these issues, this study designs a hybrid approach that integrates knowledge distillation with a combination of CNN and transformer to enhance semantic segmentation in remote sensing images. First, this article proposes the dual-path convolutional transformer network (DP-CTNet) with a dual-path structure to leverage the strengths of both CNN and transformers. It incorporates a feature refinement module to optimize the transformer's feature learning, and a feature fusion module to effectively merge CNN and transformer features, preventing the insufficient learning of local features by the transformer. Then, DP-CTNet serves as the teacher model, and pruning and knowledge distillation are employed to create efficient DP-CTNet (EDP-CTNet) with superior segmentation speed and accuracy. Angle knowledge distillation (AKD) is proposed to enhance the feature migration learning of DP-CTNet during knowledge distillation, leading to improved EDP-CTNet performance. Experimental results demonstrate that DP-CTNet thoroughly combines the respective advantages of CNN and Transformer, maintaining local detail features while learning extensive sequential semantic information. EDP-CTNet not only delivers impressive segmentation speed but also exhibits excellent segmentation accuracy following AKD training. In comparison to other models, the two models proposed in this article notably distinguish themselves in terms of accuracy and result visualization.
Keywords