IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2022)

Self-Supervised Learning for Invariant Representations From Multi-Spectral and SAR Images

  • Pallavi Jain,
  • Bianca Schoen-Phelan,
  • Robert Ross

DOI
https://doi.org/10.1109/JSTARS.2022.3204888
Journal volume & issue
Vol. 15
pp. 7797 – 7808

Abstract

Read online

Self-supervised learning (SSL) has become the new state of the art in several domain classification and segmentation tasks. One popular category of SSL are distillation networks, such as Bootstrap Your Own Latent (BYOL). This work proposes RS-BYOL, which builds on BYOL in the remote sensing (RS) domain where data are nontrivially different from natural RGB images. Since multispectral (MS) and synthetic aperture radar (SAR) sensors provide varied spectral and spatial resolution information, we utilize them as an implicit augmentation to learn invariant feature embeddings. In order to learn RS-based invariant features with SSL, we trained RS-BYOL in two ways, i.e., single channel feature learning and three channel feature learning. This work explores the usefulness of single channel feature learning from random 10 MS bands of 10–20 m resolution and VV-VH of SAR bands compared to the common notion of using three or more bands. In our linear probing evaluation, these single channel features reached a 0.92 F1 score on the EuroSAT classification task and 59.6 mIoU on the IEEE Data Fusion Contest segmentation task for certain single bands. We also compare our results with ImageNet weights and show that the RS-based SSL model outperforms the supervised ImageNet-based model. We further explore the usefulness of multimodal data compared to single modality data, and it is shown that utilizing MS and SAR data allows better invariant representations to be learnt than utilizing only MS data.

Keywords