TQU-SLAM Benchmark Dataset for Comparative Study to Build Visual Odometry Based on Extracted Features from Feature Descriptors and Deep Learning

Thi-Hao Nguyen; Van-Hung Le; Huu-Son Do; Trung-Hieu Te; Van-Nam Phan

doi:10.3390/fi16050174

Future Internet (May 2024)

TQU-SLAM Benchmark Dataset for Comparative Study to Build Visual Odometry Based on Extracted Features from Feature Descriptors and Deep Learning

Thi-Hao Nguyen,
Van-Hung Le,
Huu-Son Do,
Trung-Hieu Te,
Van-Nam Phan

Affiliations

Thi-Hao Nguyen: Faculty of Engineering Technology, Hung Vuong University, Viet Tri City 35100, Vietnam
Van-Hung Le: Faculty of Basic Science, Tan Trao University, Tuyen Quang City 22000, Vietnam
Huu-Son Do: Faculty of Basic Science, Tan Trao University, Tuyen Quang City 22000, Vietnam
Trung-Hieu Te: Faculty of Basic Science, Tan Trao University, Tuyen Quang City 22000, Vietnam
Van-Nam Phan: Faculty of Basic Science, Tan Trao University, Tuyen Quang City 22000, Vietnam

DOI: https://doi.org/10.3390/fi16050174
Journal volume & issue: Vol. 16, no. 5
p. 174

Abstract

Read online

The problem of data enrichment to train visual SLAM and VO construction models using deep learning (DL) is an urgent problem today in computer vision. DL requires a large amount of data to train a model, and more data with many different contextual and conditional conditions will create a more accurate visual SLAM and VO construction model. In this paper, we introduce the TQU-SLAM benchmark dataset, which includes 160,631 RGB-D frame pairs. It was collected from the corridors of three interconnected buildings comprising a length of about 230 m. The ground-truth data of the TQU-SLAM benchmark dataset were prepared manually, including 6-DOF camera poses, 3D point cloud data, intrinsic parameters, and the transformation matrix between the camera coordinate system and the real world. We also tested the TQU-SLAM benchmark dataset using the PySLAM framework with traditional features such as SHI_TOMASI, SIFT, SURF, ORB, ORB2, AKAZE, KAZE, and BRISK and features extracted from DL such as VGG, DPVO, and TartanVO. The camera pose estimation results are evaluated, and we show that the ORB2 features have the best results (Errd = 5.74 mm), while the ratio of the number of frames with detected keypoints of the SHI_TOMASI feature is the best (rd=98.97%). At the same time, we also present and analyze the challenges of the TQU-SLAM benchmark dataset for building visual SLAM and VO systems.

Published in Future Internet

ISSN: 1999-5903 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/futureinternet/

About the journal

Abstract

Keywords