Scientific Reports (Aug 2024)

RS-Dseg: semantic segmentation of high-resolution remote sensing images based on a diffusion model component with unsupervised pretraining

  • Zheng Luo,
  • Jianping Pan,
  • Yong Hu,
  • Lin Deng,
  • Yimeng Li,
  • Chen Qi,
  • Xunxun Wang

DOI
https://doi.org/10.1038/s41598-024-69022-1
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Semantic segmentation plays a crucial role in interpreting remote sensing images, especially in high-resolution scenarios where finer object details, complex spatial information and texture structures exist. To address the challenge of better extracting semantic information and ad-dressing class imbalance in multiclass segmentation, we propose utilizing diffusion models for remote sensing image semantic segmentation, along with a lightweight classification module based on a spatial-channel attention mechanism. Our approach incorporates unsupervised pretrained components with a classification module to accelerate model convergence. The diffusion model component, built on the UNet architecture, effectively captures multiscale features with rich contextual and edge information from images. The lightweight classification module, which leverages spatial-channel attention, focuses more efficiently on spatial-channel regions with significant feature information. We evaluated our approach using three publicly available datasets: Postdam, GID, and Five Billion Pixels. In the test of three datasets, our method achieved the best results. On the GID dataset, the overall accuracy was 96.99%, the mean IoU was 92.17%, and the mean F1 score was 95.83%. In the training phase, our model achieved good performance after only 30 training cycles. Compared with other models, our method reduces the number of parameters, improves the training speed, and has obvious performance advantages.

Keywords