IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2023)

Class-Guidance Network Based on the Pyramid Vision Transformer for Efficient Semantic Segmentation of High-Resolution Remote Sensing Images

  • Shuang Du,
  • Maohua Liu

DOI
https://doi.org/10.1109/JSTARS.2023.3285632
Journal volume & issue
Vol. 16
pp. 5578 – 5589

Abstract

Read online

Small differences between classes and big variations within classes in multicategory semantic segmentation are problems that are not completely solved by the “encoder–decoder” structure of the fully convolutional neural network, leading to the imprecise perception of easily confused categories. To address this issue, in this article, we believe that sufficient contextual information can provide more interpretation clues to the model. Additionally, if we can mine the class-specific perceptual information for each semantic class, we can enhance the information belonging to the corresponding class in the decoding process. Therefore, we propose the class-guidance network based on the pyramid vision transformer (PVT). In detail, with the PVT as the encoder network, the following decoding process is composed of three stages. First, we design a receptive field block to expand the receptive field to different degrees using parallel branching processing and different dilatation rates. Second, we put forward a semantic guidance block to utilize the high-level features to guide the channel enhancement of low-level features. Third, we propose the class guidance block to achieve the class-aware guidance of adjacent features and achieve the refined segmentation by a progressive approach. The overall accuracy of the method is 88.91% and 88.87%, respectively, according to experimental findings on the Potsdam and Vaihingen datasets.

Keywords