IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2023)

Enhancing Efficient Global Understanding Network With CSWin Transformer for Urban Scene Images Segmentation

  • Jie Zhang,
  • Mingwen Shao,
  • Yuanjian Qiao,
  • Xiangyong Cao

DOI
https://doi.org/10.1109/JSTARS.2023.3328559
Journal volume & issue
Vol. 16
pp. 10230 – 10245

Abstract

Read online

The global context is crucial to the semantic segmentation task of remote sensing (RS) urban scene imagery since objects have large size variations, high similarity, and mutual occlusion. However, the existing methods for extracting global context information have limitations when directly applied to very high-resolution RS images, mainly in high complexity of computation and memory consumption. To alleviate this limitation, we propose a novel Efficient Global Understanding semantic segmentation Network (EGUNet) to extract global context information efficiently for applicability to RS images. Specifically, EGUNet is a hybrid U-shaped architecture of convolutional neural networks (CNNs) and Transformer in which the encoder uses the CSWin Transformer to capture global semantic information, and the decoder uses the CNNs structure to recover local detail information. Thus, the proposed EGUNet has a powerful global extraction capability and local position information recovery capability. In addition, three effective modules are proposed to improve the segmentation accuracy to make EGUNet more applicable for urban scene image segmentation tasks. First, a feature adaptive fusion module is introduced in the decoder to improve the fusion of the deep semantics and the location detail features. Second, an adaptive atrous-spatial pyramid pooling is designed at the skip connections to enhance the multiscale understanding of high-level semantic context. Finally, we introduce a lightweight enhanced segmentation head to utilize the information from each decoder stage for segmentation. Extensive experimental results on ISPRS Vaihingen and Potsdam datasets demonstrate the exceptional segmentation accuracy of EGUNet, outperforming the state-of-the-art methods.

Keywords