Scale-Aware Visual-Inertial Depth Estimation and Odometry Using Monocular Self-Supervised Learning

Chungkeun Lee; Changhyeon Kim; Pyojin Kim; Hyeonbeom Lee; H. Jin Kim

doi:10.1109/ACCESS.2023.3252884

IEEE Access (Jan 2023)

Scale-Aware Visual-Inertial Depth Estimation and Odometry Using Monocular Self-Supervised Learning

Chungkeun Lee,
Changhyeon Kim,
Pyojin Kim,
Hyeonbeom Lee,
H. Jin Kim

Affiliations

Chungkeun Lee: ORCiD; Institute of Advanced Aerospace Technology, Seoul National University, Gwanak-gu, Seoul, South Korea
Changhyeon Kim: ORCiD; Automation and Systems Research Institute, Seoul National University, Gwanak-gu, Seoul, South Korea
Pyojin Kim: ORCiD; Department of Mechanical Systems Engineering, Sookmyung Women’s University, Yongsan-gu, Seoul, South Korea
Hyeonbeom Lee: ORCiD; School of Electronic and Electrical Engineering, Kyungpook National University, Buk-gu, Daegu, South Korea
H. Jin Kim: ORCiD; Department of Mechanical and Aerospace Engineering, Seoul National University, Gwanak-gu, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2023.3252884
Journal volume & issue: Vol. 11
pp. 24087 – 24102

Abstract

Read online

For real-world applications with a single monocular camera, scale ambiguity is an important issue. Because self-supervised data-driven approaches that do not require additional data containing scale information cannot avoid the scale ambiguity, state-of-the-art deep-learning-based methods address this issue by learning the scale information from additional sensor measurements. In that regard, inertial measurement unit (IMU) is a popular sensor for various mobile platforms due to its lightweight and inexpensiveness. However, unlike supervised learning that can learn the scale from the ground-truth information, learning the scale from IMU is challenging in a self-supervised setting. We propose a scale-aware monocular visual-inertial depth estimation and odometry method with end-to-end training. To learn the scale from the IMU measurements with end-to-end training in the monocular self-supervised setup, we propose a new loss function named as preintegration loss function, which trains scale-aware ego-motion by comparing the ego-motion integrated from IMU measurement and predicted ego-motion. Since the gravity and the bias should be compensated to obtain the ego-motion by integrating IMU measurements, we design a network to predict the gravity and the bias in addition to the ego-motion and the depth map. The overall performance of the proposed method is compared to state-of-the-art methods in the popular outdoor driving dataset, i.e., KITTI dataset, and the author-collected indoor driving dataset. In the KITTI dataset, the proposed method shows competitive performance compared with state-of-the-art monocular depth estimation and odometry methods, i.e., root-mean-square error of 5.435 m in the KITTI Eigen split and absolute trajectory error of 22.46 m and 0.2975 degrees in the KITTI odometry 09 sequence. Different from other up-to-scale monocular methods, the proposed method can estimate the metric-scaled depth and camera poses. Additional experiments on the author-collected indoor driving dataset qualitatively confirm the accurate performance of metric-depth and metric pose estimations.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords