ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences (Nov 2024)

Robust Multi-modal Remote Sensing Image Semantic Segmentation Using Tuple Perturbation-based Contrastive Learning

  • J. Dai,
  • L. Zhou,
  • K. Duan,
  • Y. Zhao,
  • Y. Ye

DOI
https://doi.org/10.5194/isprs-annals-X-3-2024-77-2024
Journal volume & issue
Vol. X-3-2024
pp. 77 – 84

Abstract

Read online

Deep learning models exhibit promising potential in multi-modal remote sensing image semantic segmentation (MRSISS). However, the constrained access to labeled samples for training deep learning networks significantly influences the performance of these models. To address that, self-supervised learning (SSL) methods have garnered significant interest in the remote sensing community. Accordingly, this article proposes a novel multi-modal contrastive learning framework based on tuple perturbation. Firstly, a tuple perturbation-based multi-modal contrastive learning network (TMCNet) is designed to better explore shared and different feature representations across modalities during the pre-training stage and the tuple perturbation module is introduced to improve the network’s ability to extract multi-modal features by generating more complex negative samples. In the fine-tuning stage, we develop a simple and effective multi-modal semantic segmentation network (MSSNet), which can reduce noise by using complementary information from various modalities to integrate multi-modal features more effectively, resulting in better semantic segmentation performance. Extensive experiments have been carried out on two published multi-modal image datasets including optical and SAR pairs, and the results show that the proposed framework can obtain superior performance of semantic segmentation than the current state-of-the-art methods in cases of limited labeled samples.