IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2025)
DS-SwinUNet: Redesigning Skip Connection With Double Scale Attention for Land Cover Semantic Segmentation
Abstract
In recent years, the development of visual transformer has gradually replaced convolutional neural networks in the visual domain with attention computation, causing pure transformer networks to become a trend. Despite significant advancements in semantic segmentation models for remote sensing, a critical gap remains in effectively capturing both local and global contextual information. Existing models often excel in either fine-grained local detail or long-range dependencies, but not both. Our work addresses this research gap by proposing the DS-SwinUNet model integrating convolutional operations with transformer-based attention mechanisms through the novel DS-transformer block, which consists of a two-scale attention mechanism incorporating convolutional computation and a modified FFN, and the module is placed in the skip connection section with Swin-UNet as the backbone. Experiments demonstrate that the transformer module proposed in this article improves the mIoU by 2.73% and 0.41% over the original Swin-UNet when the WHDLD and OpenEarthMap dataset are used as the segmentation task.
Keywords