Feature Guided Masked Autoencoder for Self-Supervised Learning in Remote Sensing

Yi Wang; Hugo Hernandez Hernandez; Conrad M Albrecht; Xiao Xiang Zhu

doi:10.1109/JSTARS.2024.3493237

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2025)

Feature Guided Masked Autoencoder for Self-Supervised Learning in Remote Sensing

Yi Wang,
Hugo Hernandez Hernandez,
Conrad M Albrecht,
Xiao Xiang Zhu

Affiliations

Yi Wang: ORCiD; Chair of Data Science in Earth Observation, Technical University of Munich, Munich, Germany
Hugo Hernandez Hernandez: Chair of Data Science in Earth Observation, Technical University of Munich, Munich, Germany
Conrad M Albrecht: ORCiD; Remote Sensing Technology Institute, German Aerospace Center (DLR), Weßling, Germany
Xiao Xiang Zhu: ORCiD; Chair of Data Science in Earth Observation, Technical University of Munich, Munich, Germany

DOI: https://doi.org/10.1109/JSTARS.2024.3493237
Journal volume & issue: Vol. 18
pp. 321 – 336

Abstract

Read online

Self-supervised learning guided by masked image modeling, such as masked autoencoder (MAE), has attracted wide attention for pretraining vision transformers in remote sensing. However, MAE tends to excessively focus on pixel details, limiting the model's capacity for semantic understanding, particularly for noisy synthetic aperture radar (SAR) images. In this article, we explore spectral and spatial remote sensing image features as improved MAE-reconstruction targets. We first conduct a study on reconstructing various image features, all performing comparably well or better than raw pixels. Based on such observations, we propose feature guided MAE (FG-MAE): reconstructing a combination of histograms of oriented gradients (HOG) and normalized difference indices (NDI) for multispectral images, and reconstructing HOG for SAR images. Experimental results on three downstream tasks illustrate the effectiveness of FG-MAE with a particular boost for SAR imagery (e.g., up to 5% better than MAE on EuroSAT-SAR). Furthermore, we demonstrate the well-inherited scalability of FG-MAE and release a first series of pretrained vision transformers for medium-resolution SAR and multispectral images.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords