IEEE Access (Jan 2023)

Remote Sensing Image Road Segmentation Method Integrating CNN-Transformer and UNet

  • Rui Wang,
  • Mingxiang Cai,
  • Zixuan Xia,
  • Zhicui Zhou

DOI
https://doi.org/10.1109/ACCESS.2023.3344797
Journal volume & issue
Vol. 11
pp. 144446 – 144455

Abstract

Read online

Real-time and accurate road information is crucial for updating electronic navigation maps. To address the problem of low precision and poor robustness in current semantic segmentation methods for road extraction from remote sensing imagery, we proposed a UNet road semantic segmentation model based on attention mechanism improvement. First, we introduce a CNN-Transformer hybrid structure to the encoder to enhance the feature extraction capabilities of global and local details. Second, the traditional upsampling module in the decoder is replaced with a dual upsampling module to improve feature extraction capabilities and segmentation accuracy. Furthermore, the hard-swish activation function is used instead of ReLU activation function to smooth the curve, which helps to improve the generalization and non-linear feature extraction abilities and avoid gradient vanishing. Finally, a comprehensive loss function combining cross entropy and dice is used to strengthen the segmentation result constraints and further improve segmentation accuracy. Experimental validation is performed on the Ottawa Road Dataset and the Massachusetts Road Dataset. Experimental results show that compared with U-Net, PSPNet, DeepLab V3 and TransUNet networks, this algorithm is the best in terms of MIoU, MPA and F1 score. Among them, on the Ottawa road data set, the MPA of this algorithm reached 95.48%. On the Massachusetts road data set, MPA is 92.56%. This method shows good performance in road extraction.

Keywords