A Contrastive Learning Based Multiview Scene Matching Method for UAV View Geo-Localization

Qiyi He; Ao Xu; Yifan Zhang; Zhiwei Ye; Wen Zhou; Ruijie Xi; Qiao Lin

doi:10.3390/rs16163039

Remote Sensing (Aug 2024)

A Contrastive Learning Based Multiview Scene Matching Method for UAV View Geo-Localization

Qiyi He,
Ao Xu,
Yifan Zhang,
Zhiwei Ye,
Wen Zhou,
Ruijie Xi,
Qiao Lin

Affiliations

Qiyi He: School of Computer Science, Hubei University of Technology, Wuhan 430068, China
Ao Xu: School of Computer Science, Hubei University of Technology, Wuhan 430068, China
Yifan Zhang: School of Computer Science, Hubei University of Technology, Wuhan 430068, China
Zhiwei Ye: School of Computer Science, Hubei University of Technology, Wuhan 430068, China
Wen Zhou: School of Computer Science, Hubei University of Technology, Wuhan 430068, China
Ruijie Xi: School of Civil Engineering and Architecture, Wuhan University of Technology, Wuhan 430070, China
Qiao Lin: School of Computer Science, University of Nottingham Ningbo China, Ningbo 315100, China

DOI: https://doi.org/10.3390/rs16163039
Journal volume & issue: Vol. 16, no. 16
p. 3039

Abstract

Read online

Multi-view scene matching refers to the establishment of a mapping relationship between images captured from different perspectives, such as those taken by unmanned aerial vehicles (UAVs) and satellites. This technology is crucial for the geo-localization of UAV views. However, the geometric discrepancies between images from different perspectives, combined with the inherent computational constraints of UAVs, present significant challenges for matching UAV and satellite images. Additionally, the imbalance of positive and negative samples between drone and satellite images during model training can lead to instability. To address these challenges, this study proposes a novel and efficient cross-view geo-localization framework called MSM-Transformer. The framework employs the Dual Attention Vision Transformer (DaViT) as the core architecture for feature extraction, which significantly enhances the modeling capacity for global features and the contextual relevance of adjacent regions. The weight-sharing mechanism in MSM-Transformer effectively reduces model complexity, making it highly suitable for deployment on embedded devices such as UAVs and satellites. Furthermore, the framework introduces a contrastive learning-based Symmetric Decoupled Contrastive Learning (DCL) loss function, which effectively mitigates the issue of sample imbalance between satellite and UAV images. Experimental validation on the University-1652 dataset demonstrates that MSM-Transformer achieves outstanding performance, delivering optimal matching results with a minimal number of parameters.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords