Remote Sensing (Apr 2022)

Semi-Supervised Adversarial Semantic Segmentation Network Using Transformer and Multiscale Convolution for High-Resolution Remote Sensing Imagery

  • Yalan Zheng,
  • Mengyuan Yang,
  • Min Wang,
  • Xiaojun Qian,
  • Rui Yang,
  • Xin Zhang,
  • Wen Dong

DOI
https://doi.org/10.3390/rs14081786
Journal volume & issue
Vol. 14, no. 8
p. 1786

Abstract

Read online

Semantic segmentation is a crucial approach for remote sensing interpretation. High-precision semantic segmentation results are obtained at the cost of manually collecting massive pixelwise annotations. Remote sensing imagery contains complex and variable ground objects and obtaining abundant manual annotations is expensive and arduous. The semi-supervised learning (SSL) strategy can enhance the generalization capability of a model with a small number of labeled samples. In this study, a novel semi-supervised adversarial semantic segmentation network is developed for remote sensing information extraction. A multiscale input convolution module (MICM) is designed to extract sufficient local features, while a Transformer module (TM) is applied for long-range dependency modeling. These modules are integrated to construct a segmentation network with a double-branch encoder. Additionally, a double-branch discriminator network with different convolution kernel sizes is proposed. The segmentation network and discriminator network are jointly trained under the semi-supervised adversarial learning (SSAL) framework to improve its segmentation accuracy in cases with small amounts of labeled data. Taking building extraction as a case study, experiments on three datasets with different resolutions are conducted to validate the proposed network. Semi-supervised semantic segmentation models, in which DeepLabv2, the pyramid scene parsing network (PSPNet), UNet and TransUNet are taken as backbone networks, are utilized for performance comparisons. The results suggest that the approach effectively improves the accuracy of semantic segmentation. The F1 and mean intersection over union (mIoU) accuracy measures are improved by 0.82–11.83% and 0.74–7.5%, respectively, over those of other methods.

Keywords