ICT-Net: A Framework for Multi-Domain Cross-View Geo-Localization with Multi-Source Remote Sensing Fusion

Min Wu; Sirui Xu; Ziwei Wang; Jin Dong; Gong Cheng; Xinlong Yu; Yang Liu

doi:10.3390/rs17121988

Remote Sensing (Jun 2025)

ICT-Net: A Framework for Multi-Domain Cross-View Geo-Localization with Multi-Source Remote Sensing Fusion

Min Wu,
Sirui Xu,
Ziwei Wang,
Jin Dong,
Gong Cheng,
Xinlong Yu,
Yang Liu

Affiliations

Min Wu: Bejing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing 100191, China
Sirui Xu: Institute of Artificial Intelligence, Beihang University, Beijing 100191, China
Ziwei Wang: Bejing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing 100191, China
Jin Dong: Beijing Academy of Blockchain and Edge Computing, Beijing 100191, China
Gong Cheng: Bejing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing 100191, China
Xinlong Yu: Jianghuai Advance Technology Center, Hefei 230088, China
Yang Liu: Bejing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing 100191, China

DOI: https://doi.org/10.3390/rs17121988
Journal volume & issue: Vol. 17, no. 12
p. 1988

Abstract

Read online

Traditional single neural network-based geo-localization methods for cross-view imagery primarily rely on polar coordinate transformations while suffering from limited global correlation modeling capabilities. To address these fundamental challenges of weak feature correlation and poor scene adaptation, we present a novel framework termed ICT-Net (Integrated CNN-Transformer Network) that synergistically combines convolutional neural networks with Transformer architectures. Our approach harnesses the complementary strengths of CNNs in capturing local geometric details and Transformers in establishing long-range dependencies, enabling comprehensive joint perception of both local and global visual patterns. Furthermore, capitalizing on the Transformer’s flexible input processing mechanism, we develop an attention-guided non-uniform cropping strategy that dynamically eliminates redundant image patches with minimal impact on localization accuracy, thereby achieving enhanced computational efficiency. To facilitate practical deployment, we propose a deep embedding clustering algorithm optimized for rapid parsing of geo-localization information. Extensive experiments demonstrate that ICT-Net establishes new state-of-the-art localization accuracy on the CVUSA benchmark, achieving a top-1 recall rate improvement of 8.6% over previous methods. Additional validation on a challenging real-world dataset collected at Beihang University (BUAA) further confirms the framework’s effectiveness and practical applicability in complex urban environments, particularly showing 23% higher robustness to vegetation variations.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords