International Journal of Applied Earth Observations and Geoinformation (Aug 2024)

Enhancing visual localization with only imperfect 3D models: A new perspective through neural rendering iterations

  • Sikang Liu,
  • Zhenqi Zheng,
  • Xueli Guo,
  • Zhichao Wen,
  • Yuan Zhuang,
  • You Li

Journal volume & issue
Vol. 132
p. 103987

Abstract

Read online

Visual localization, a camera pose estimation problem, is a core component of indoor navigation applications. Visual localization methods based on image feature databases or sparse SfM point clouds can provide accurate results but suffer from limitations such as dependency on local feature type and high memory consumption. The state-of-the-art 3D-model-based visual localization methods can overcome these limitations but still meet challenges. It requires expensive specialized equipment (e.g., LiDAR or RGB-D cameras) and complex model optimization to acquire high-precision models as offline maps. The models’ accuracy directly limits visual localization accuracy. Furthermore, the requirement of high-precision models limits the ubiquitous localization of mass users. Therefore, a key problem is whether models reconstructed from mass terminal camera images can achieve localization accuracy close to high-precision models. Model accuracy improvement based on image reconstruction is difficult to break through. Benefiting from the development of neural radiation field rendering technology, improving the accuracy of rendered synthetic images is relatively simple. Based on this idea, we propose a simple and flexible visual localization scheme that enhances visual localization with only imperfect 3D models from a new perspective through neural rendering iterations. In terms of scheme design, it is first proposed to use imperfect 3D models provided by commercial reconstruction algorithms instead of high-precision 3D models to complete visual localization. This idea avoids expensive professional equipment and becomes more universal. In terms of scheme optimization, to bypass the complex model refinement process of commercial 3D models and further improve the initial localization accuracy, the neural rendering iterative improved visual localization (NerfIVL) based method is proposed. This method iteratively updates the pose using the difference between the pixels of the rendered and captured images, allowing for finer localization results. Experimental results performed on 12 sets of datasets from 12Scenes show that our proposed method compares with the localization accuracy of high-precision 3D models reconstructed by professional RGB-D cameras with a difference ranging from only 2.93% to 17.05%, among which the average value for six datasets is less than 8%. This result indicates that comparable visual localization accuracy can be achieved.

Keywords