Multi-level representation learning via ConvNeXt-based network for unaligned cross-view matching

Fangli Guan; Nan Zhao; Zhixiang Fang; Ling Jiang; Jianhui Zhang; Yue Yu; Haosheng Huang

doi:10.1080/10095020.2024.2439385

Geo-spatial Information Science (Jan 2025)

Multi-level representation learning via ConvNeXt-based network for unaligned cross-view matching

Fangli Guan,
Nan Zhao,
Zhixiang Fang,
Ling Jiang,
Jianhui Zhang,
Yue Yu,
Haosheng Huang

Affiliations

Fangli Guan: School of Computer Science, Hangzhou Dianzi University, Hangzhou, China
Nan Zhao: School of Computer Science, Hangzhou Dianzi University, Hangzhou, China
Zhixiang Fang: State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
Ling Jiang: Anhui Province Key Laboratory of Physical Geographic Environment, Chuzhou University, Chuzhou, China
Jianhui Zhang: School of Computer Science, Hangzhou Dianzi University, Hangzhou, China
Yue Yu: Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Hong Kong, China
Haosheng Huang: Department of Geography, Ghent University, Ghent, Belgium

DOI: https://doi.org/10.1080/10095020.2024.2439385

Abstract

Read online

Cross-view matching refers to the use of images from different platforms (e.g. drone and satellite views) to retrieve the most relevant images, where the key is that the viewpoints and spatial resolution. However, most of the existing methods focus on extracting fine-grained features and ignore the connection of contextual information in the image. Therefore, we propose a novel ConvNeXt-based multi-level representation learning model for the solution of this task. First, we extract global features through the ConvNeXt model. In order to obtain a joint part-based representation learning from the global features, we then replicated the obtained global features, operating one copy with spatial attention and the other copy using a standard convolutional operation. In addition, the features of different branches are aggregated through the multilevel feature fusion module to prepare for cross-view matching. Finally, we created a new hybrid loss function to better limit these features and assist in mining crucial data regarding global features. The experimental results indicate that we have achieved advanced performance on two common datasets, University-1652 and SUES-200 at 89.79% and 95.75% in drone target matching and 94.87% and 98.80 in drone navigation.

Published in Geo-spatial Information Science

ISSN: 1009-5020 (Print); 1993-5153 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Geography. Anthropology. Recreation: Mathematical geography. Cartography; Science: Astronomy: Geodesy
Website: https://www.tandfonline.com/journals/tgsi

About the journal

Abstract

Keywords