Remote Sensing (Sep 2023)

CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation

  • Xin Chen,
  • Dongfen Li,
  • Mingzhe Liu,
  • Jiaru Jia

DOI
https://doi.org/10.3390/rs15184455
Journal volume & issue
Vol. 15, no. 18
p. 4455

Abstract

Read online

Semantic segmentation of remote sensing images has been widely used in environmental protection, geological disaster discovery, and natural resource assessment. With the rapid development of deep learning, convolutional neural networks (CNNs) have dominated semantic segmentation, relying on their powerful local information extraction capabilities. Due to the locality of convolution operation, it can be challenging to obtain global context information directly. However, Transformer has excellent potential in global information modeling. This paper proposes a new hybrid convolutional and Transformer semantic segmentation model called CTFuse, which uses a multi-scale convolutional attention module in the convolutional part. CTFuse is a serial structure composed of a CNN and a Transformer. It first uses convolution to extract small-size target information and then uses Transformer to embed large-size ground target information. Subsequently, we propose a spatial and channel attention module in convolution to enhance the representation ability for global information and local features. In addition, we also propose a spatial and channel attention module in Transformer to improve the ability to capture detailed information. Finally, compared to other models used in the experiments, our CTFuse achieves state-of-the-art results on the International Society of Photogrammetry and Remote Sensing (ISPRS) Vaihingen and ISPRS Potsdam datasets.

Keywords