Robust Multi-modal Remote Sensing Image Semantic Segmentation Using Tuple Perturbation-based Contrastive Learning

J. Dai; L. Zhou; K. Duan; Y. Zhao; Y. Ye

doi:10.5194/isprs-annals-X-3-2024-77-2024

ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences (Nov 2024)

Robust Multi-modal Remote Sensing Image Semantic Segmentation Using Tuple Perturbation-based Contrastive Learning

J. Dai,
L. Zhou,
K. Duan,
Y. Zhao,
Y. Ye

Affiliations

J. Dai: Faculty of Geosiences and Engineering, Southwest Jiaotong University, 611756, Chengdu, China
L. Zhou: Faculty of Geosiences and Engineering, Southwest Jiaotong University, 611756, Chengdu, China
K. Duan: Faculty of Geosiences and Engineering, Southwest Jiaotong University, 611756, Chengdu, China
Y. Zhao: The Second Topographic Surveying Brigade of Ministry of Natural Resources, 710054, Xian, China
Y. Ye: Faculty of Geosiences and Engineering, Southwest Jiaotong University, 611756, Chengdu, China

DOI: https://doi.org/10.5194/isprs-annals-X-3-2024-77-2024
Journal volume & issue: Vol. X-3-2024
pp. 77 – 84

Abstract

Read online

Deep learning models exhibit promising potential in multi-modal remote sensing image semantic segmentation (MRSISS). However, the constrained access to labeled samples for training deep learning networks significantly influences the performance of these models. To address that, self-supervised learning (SSL) methods have garnered significant interest in the remote sensing community. Accordingly, this article proposes a novel multi-modal contrastive learning framework based on tuple perturbation. Firstly, a tuple perturbation-based multi-modal contrastive learning network (TMCNet) is designed to better explore shared and different feature representations across modalities during the pre-training stage and the tuple perturbation module is introduced to improve the network’s ability to extract multi-modal features by generating more complex negative samples. In the fine-tuning stage, we develop a simple and effective multi-modal semantic segmentation network (MSSNet), which can reduce noise by using complementary information from various modalities to integrate multi-modal features more effectively, resulting in better semantic segmentation performance. Extensive experiments have been carried out on two published multi-modal image datasets including optical and SAR pairs, and the results show that the proposed framework can obtain superior performance of semantic segmentation than the current state-of-the-art methods in cases of limited labeled samples.

Published in ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISSN: 2194-9042 (Print); 2194-9050 (Online)
Publisher: Copernicus Publications
Country of publisher: Germany
LCC subjects: Technology: Engineering (General). Civil engineering (General): Applied optics. Photonics
Website: http://www.isprs.org/publications/annals.aspx

About the journal