IEEE Access (Jan 2024)
Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image
Abstract
Visual localization has become a crucial task in robotics, especially in autonomous vehicles and virtual reality, due to its ability to utilize inexpensive sensors and achieve high accuracy. Among various methods, the scene coordinate regression network is a recent approach. This method uses a neural network to regress the 2D-3D correspondences from images and utilizes these correspondences in a pose solver like PnP-RANSAC to estimate the pose of the query image. A common challenge is that regressing these correspondences often involves sampling across the entire 2D image, which is inefficient as not all areas contain useful information for the network. To address this, we propose sampling only the essential regions of an image to enhance the network’s learning efficiency. Our method selectively captures informative features by integrating the structural and edge contexts within images, identifying robust regions for sampling. This refinement allows the network to learn 2D-3D correspondences better. We tested our approach using both the publicly available outdoor dataset and our custom dataset, where it achieved state-of-the-art results in a large dataset.
Keywords