IEEE Access (Jan 2023)
Cross-View Geo-Localization for Autonomous UAV Using Locally-Aware Transformer-Based Network
Abstract
Although GPS is commonly used for the autonomous flying of unmanned aerial vehicles (UAVs), researchers mainly focus on image-based localization methods due to their tremendous advantages when it comes to GPS-denied environments. In this study, we study the problem of image-based geo-localization between UAV and satellite (known as cross-view geo-localization), which is an essential step towards image-based localization. In cross-view geo-localization, extracting fine-grained features containing contextual information from images is challenging due to the large gap in visual representations between different views. Existing methods in this field often use convolutional neural networks (CNNs) as feature extractors. However, CNNs have some limitations in receptive fields, which leads to the loss of fine-grained information. Some researchers have implemented Transformer-based networks to overcome these circumstances. However, these approaches only focused on understanding the meaning of each pixel based on their attention and only partially utilized tokens that are produced from Transformer blocks. Therefore, different from these works, we proposed a Vision Transformer-based network that takes advantage of local tokens, especially the classification token. Through experiments, our proposed model has significantly outperformed existing state-of-the-art models, which gave promising capabilities for developing this method in the future.
Keywords