IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

Dual Encoder–Decoder Network for Land Cover Segmentation of Remote Sensing Image

  • Zhongchen Wang,
  • Min Xia,
  • Liguo Weng,
  • Kai Hu,
  • Haifeng Lin

DOI
https://doi.org/10.1109/JSTARS.2023.3347595
Journal volume & issue
Vol. 17
pp. 2372 – 2385

Abstract

Read online

Although the vision transformer-based methods (ViTs) exhibit an excellent performance than convolutional neural networks (CNNs) for image recognition tasks, their pixel-level semantic segmentation ability is limited due to the lack of explicit utilization of local biases. Recently, a variety of hybrid structures of ViT and CNN have been proposed, but these methods have poor multiscale fusion ability and cannot accurately segment high-resolution and high-content complex land cover remote sensing images. Therefore, a dual encoder–decoder network named DEDNet is proposed in this work. In the encoding stage, the local and global information of the image is extracted by parallel CNN encoder and transformer encoder. In the decoding stage, the cross-stage fusion module is constructed to achieve neighborhood attention guidance to enhance the positioning of small targets, effectively avoiding intraclass inconsistency. At the same time, the multihead feature extraction module is proposed to strengthen the recognition ability of the target boundary and effectively avoid interclass ambiguity. Before outputting, the fusion spatial pyramid pooling classifier is proposed to merge the outputs of the two decoding strategies. The experiments demonstrate that the proposed model has superior generalization performance and can handle various semantic segmentation tasks of land cover.

Keywords