Earth Observation Multi-Spectral Image Fusion with Transformers for Sentinel-2 and Sentinel-3 Using Synthetic Training Data

Pierre-Laurent Cristille; Emmanuel Bernhard; Nick L. J. Cox; Jeronimo Bernard-Salas; Antoine Mangin

doi:10.3390/rs16163107

Remote Sensing (Aug 2024)

Earth Observation Multi-Spectral Image Fusion with Transformers for Sentinel-2 and Sentinel-3 Using Synthetic Training Data

Pierre-Laurent Cristille,
Emmanuel Bernhard,
Nick L. J. Cox,
Jeronimo Bernard-Salas,
Antoine Mangin

Affiliations

Pierre-Laurent Cristille: ACRI-ST, Centre d’Etudes et de Recherche de Grasse (CERGA), 10 Av. Nicolas Copernic, 06130 Grasse, France
Emmanuel Bernhard: ACRI-ST, Centre d’Etudes et de Recherche de Grasse (CERGA), 10 Av. Nicolas Copernic, 06130 Grasse, France
Nick L. J. Cox: ACRI-ST, Centre d’Etudes et de Recherche de Grasse (CERGA), 10 Av. Nicolas Copernic, 06130 Grasse, France
Jeronimo Bernard-Salas: ACRI-ST, Centre d’Etudes et de Recherche de Grasse (CERGA), 10 Av. Nicolas Copernic, 06130 Grasse, France
Antoine Mangin: ACRI-ST, Centre d’Etudes et de Recherche de Grasse (CERGA), 10 Av. Nicolas Copernic, 06130 Grasse, France

DOI: https://doi.org/10.3390/rs16163107
Journal volume & issue: Vol. 16, no. 16
p. 3107

Abstract

Read online

With the increasing number of ongoing space missions for Earth Observation (EO), there is a need to enhance data products by combining observations from various remote sensing instruments. We introduce a new Transformer-based approach for data fusion, achieving up to a 10- to-30-fold increase in the spatial resolution of our hyperspectral data. We trained the network on a synthetic set of Sentinel-2 (S2) and Sentinel-3 (S3) images, simulated from the hyperspectral mission EnMAP (30 m resolution), leading to a fused product of 21 bands at a 30 m ground resolution. The performances were calculated by fusing original S2 (12 bands, 10, 20, and 60 m resolutions) and S3 (21 bands, 300 m resolution) images. To go beyond EnMap’s ground resolution, the network was also trained using a generic set of non-EO images from the CAVE dataset. However, we found that training the network on contextually relevant data is crucial. The EO-trained network significantly outperformed the non-EO-trained one. Finally, we observed that the original network, trained at 30 m ground resolution, performed well when fed images at 10 m ground resolution, likely due to the flexibility of Transformer-based networks.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords