Future Internet (May 2024)

TQU-SLAM Benchmark Dataset for Comparative Study to Build Visual Odometry Based on Extracted Features from Feature Descriptors and Deep Learning

  • Thi-Hao Nguyen,
  • Van-Hung Le,
  • Huu-Son Do,
  • Trung-Hieu Te,
  • Van-Nam Phan

DOI
https://doi.org/10.3390/fi16050174
Journal volume & issue
Vol. 16, no. 5
p. 174

Abstract

Read online

The problem of data enrichment to train visual SLAM and VO construction models using deep learning (DL) is an urgent problem today in computer vision. DL requires a large amount of data to train a model, and more data with many different contextual and conditional conditions will create a more accurate visual SLAM and VO construction model. In this paper, we introduce the TQU-SLAM benchmark dataset, which includes 160,631 RGB-D frame pairs. It was collected from the corridors of three interconnected buildings comprising a length of about 230 m. The ground-truth data of the TQU-SLAM benchmark dataset were prepared manually, including 6-DOF camera poses, 3D point cloud data, intrinsic parameters, and the transformation matrix between the camera coordinate system and the real world. We also tested the TQU-SLAM benchmark dataset using the PySLAM framework with traditional features such as SHI_TOMASI, SIFT, SURF, ORB, ORB2, AKAZE, KAZE, and BRISK and features extracted from DL such as VGG, DPVO, and TartanVO. The camera pose estimation results are evaluated, and we show that the ORB2 features have the best results (Errd = 5.74 mm), while the ratio of the number of frames with detected keypoints of the SHI_TOMASI feature is the best (rd=98.97%). At the same time, we also present and analyze the challenges of the TQU-SLAM benchmark dataset for building visual SLAM and VO systems.

Keywords