Electronic Research Archive (May 2024)
Hybrid self-supervised monocular visual odometry system based on spatio-temporal features
Abstract
For the autonomous and intelligent operation of robots in unknown environments, simultaneous localization and mapping (SLAM) is essential. Since the proposal of visual odometry, the use of visual odometry in the mapping process has greatly advanced the development of pure visual SLAM techniques. However, the main challenges in current monocular odometry algorithms are the poor generalization of traditional methods and the low interpretability of deep learning-based methods. This paper presented a hybrid self-supervised visual monocular odometry framework that combined geometric principles and multi-frame temporal information. Moreover, a post-odometry optimization module was proposed. By using image synthesis techniques to insert synthetic views between the two frames undergoing pose estimation, more accurate inter-frame pose estimation was achieved. Compared to other public monocular algorithms, the proposed approach showed reduced average errors in various scene sequences, with a translation error of $ 2.211\% $ and a rotation error of $ 0.418\; ^{\circ}/100m $. With the help of the proposed optimizer, the precision of the odometry algorithm was further improved, with a relative decrease of approximately 10$ \% $ intranslation error and 15$ \% $ in rotation error.
Keywords