CAAI Transactions on Intelligence Technology (Mar 2023)

Multi‐scale attention encoder for street‐to‐aerial image geo‐localization

  • Songlian Li,
  • Zhigang Tu,
  • Yujin Chen,
  • Tan Yu

DOI
https://doi.org/10.1049/cit2.12077
Journal volume & issue
Vol. 8, no. 1
pp. 166 – 176

Abstract

Read online

Abstract The goal of street‐to‐aerial cross‐view image geo‐localization is to determine the location of the query street‐view image by retrieving the aerial‐view image from the same place. The drastic viewpoint and appearance gap between the aerial‐view and the street‐view images brings a huge challenge against this task. In this paper, we propose a novel multiscale attention encoder to capture the multiscale contextual information of the aerial/street‐view images. To bridge the domain gap between these two view images, we first use an inverse polar transform to make the street‐view images approximately aligned with the aerial‐view images. Then, the explored multiscale attention encoder is applied to convert the image into feature representation with the guidance of the learnt multiscale information. Finally, we propose a novel global mining strategy to enable the network to pay more attention to hard negative exemplars. Experiments on standard benchmark datasets show that our approach obtains 81.39% top‐1 recall rate on the CVUSA dataset and 71.52% on the CVACT dataset, achieving the state‐of‐the‐art performance and outperforming most of the existing methods significantly.

Keywords