International Journal of Applied Earth Observations and Geoinformation (Dec 2022)
MCRN: A Multi-source Cross-modal Retrieval Network for remote sensing
Abstract
Cross-modal remote sensing retrieval (RSCR) has an increasing importance due to the ability to quickly and flexibly retrieve valuable data from enormous remote sensing (RS) images. However, traditional RSCR methods tend to focus on the retrieval between two modalities, when the number of modalities increases, the contradiction between the increasing semantic gap and the small amount of paired data causes the model fail to learn a superior modal representation. In this paper, inspired by the visual-based modal center in RS, we construct a multi-source cross-modal retrieval network (MCRN) that manages to unify RS retrieval tasks under multiple retrieval sources. To solve the data heterogeneity caused by multiple data sources, we propose a shared pattern transfer module (SPTM) based on pattern memory and combine the theory of generative adversarial to achieve the semantic representation unbound from modality. Simultaneously, to cope with the lack of annotation data in the RS scenario, multiple unimodal self-supervised frameworks are unified to obtain robust pre-training parameters for the designed MCRN by combining domain alignment and contrastive learning. Finally, we come up with the multi-source triplet loss, the unimodal contrast loss, and the semantic consistency loss, which efficaciously make MCRN achieve competitive results through multitask learning for semantic alignment. We construct multimodal datasets M-RSICD and M-RSITMD, conduct extensive experiments and provide a complete benchmark to facilitate the development of RS multi-source cross-modal retrieval. The code of the MCRN method and the proposed dataset have been open to access at [Link].