IEEE Access (Jan 2024)
A Novel One-to-One Framework for Relative Camera Pose Estimation
Abstract
To address the challenge of relative camera pose estimation, many permutation-invariant neural networks have been developed to process sparse correspondences with constant latency. These networks typically utilize an n-to-n framework, where n putative correspondences from the same image pairs are placed in distinct batch instances without any specific order. This uncorrelated set-type input structure does not sufficiently facilitate the extraction of contextual information for the correspondences. In this paper, we introduce a novel one-to-one framework designed to maximize context interaction within the network. Our framework prioritizes providing specialized context for each correspondence and enhancing the interaction of context data and correspondence data through a carefully designed input structure and network architecture schema. We conducted a series of experiments using various architectures within the one-to-one framework. Our results demonstrate that one-to-one networks not only matches but often surpasses the performance of traditional n-to-n networks, highlighting the one-to-one framework’s significant potential and efficacy. To ensure a fair comparison, all one-to-one and n-to-n networks were trained on Google’s Tensor Processing Units (TPUs). Notably, the memory capacity of a single TPUv4 device is sufficient to train one-to-one networks presented without requiring the use of multiple devices in a TPU pod.
Keywords