Self-Supervised Learning across the Spectrum

Jayanth Shenoy; Xingjian Davis Zhang; Bill Tao; Shlok Mehrotra; Rem Yang; Han Zhao; Deepak Vasisht

doi:10.3390/rs16183470

Remote Sensing (Sep 2024)

Self-Supervised Learning across the Spectrum

Jayanth Shenoy,
Xingjian Davis Zhang,
Bill Tao,
Shlok Mehrotra,
Rem Yang,
Han Zhao,
Deepak Vasisht

Affiliations

Jayanth Shenoy: University of Illinois Urbana-Champaign, Champaign, IL 61801, USA
Xingjian Davis Zhang: University of Illinois Urbana-Champaign, Champaign, IL 61801, USA
Bill Tao: University of Illinois Urbana-Champaign, Champaign, IL 61801, USA
Shlok Mehrotra: University of Illinois Urbana-Champaign, Champaign, IL 61801, USA
Rem Yang: University of Illinois Urbana-Champaign, Champaign, IL 61801, USA
Han Zhao: University of Illinois Urbana-Champaign, Champaign, IL 61801, USA
Deepak Vasisht: University of Illinois Urbana-Champaign, Champaign, IL 61801, USA

DOI: https://doi.org/10.3390/rs16183470
Journal volume & issue: Vol. 16, no. 18
p. 3470

Abstract

Read online

Satellite image time series (SITS) segmentation is crucial for many applications, like environmental monitoring, land cover mapping, and agricultural crop type classification. However, training models for SITS segmentation remains a challenging task due to the lack of abundant training data, which requires fine-grained annotation. We propose S4, a new self-supervised pretraining approach that significantly reduces the requirement for labeled training data by utilizing two key insights of satellite imagery: (a) Satellites capture images in different parts of the spectrum, such as radio frequencies and visible frequencies. (b) Satellite imagery is geo-registered, allowing for fine-grained spatial alignment. We use these insights to formulate pretraining tasks in S4. To the best of our knowledge, S4 is the first multimodal and temporal approach for SITS segmentation. S4’s novelty stems from leveraging multiple properties required for SITS self-supervision: (1) multiple modalities, (2) temporal information, and (3) pixel-level feature extraction. We also curate m2s2-SITS, a large-scale dataset of unlabeled, spatially aligned, multimodal, and geographic-specific SITS that serves as representative pretraining data for S4. Finally, we evaluate S4 on multiple SITS segmentation datasets and demonstrate its efficacy against competing baselines while using limited labeled data. Through a series of extensive comparisons and ablation studies, we demonstrate S4’s ability as an effective feature extractor for downstream semantic segmentation.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords