Hourglass 3D CNN for Stereo Disparity Estimation for Mobile Robots

Thai La; Linh Tao; Chanh Minh Tran; Tho Nguyen Duc; Eiji Kamioka; Phan Xuan Tan

doi:10.3390/app131910677

Applied Sciences (Sep 2023)

Hourglass 3D CNN for Stereo Disparity Estimation for Mobile Robots

Thai La,
Linh Tao,
Chanh Minh Tran,
Tho Nguyen Duc,
Eiji Kamioka,
Phan Xuan Tan

Affiliations

Thai La: School of Mechanical Engineering, Hanoi University of Science and Technology, Hanoi 100000, Vietnam
Linh Tao: School of Mechanical Engineering, Hanoi University of Science and Technology, Hanoi 100000, Vietnam
Chanh Minh Tran: Graduate School of Engineering and Science, Shibaura Institute of Technology, Tokyo 135-8548, Japan
Tho Nguyen Duc: Graduate School of Engineering and Science, Shibaura Institute of Technology, Tokyo 135-8548, Japan
Eiji Kamioka: Department of Information and Communications Engineering, Shibaura Institute of Technology, Tokyo 135-8548, Japan
Phan Xuan Tan: Department of Information and Communications Engineering, Shibaura Institute of Technology, Tokyo 135-8548, Japan

DOI: https://doi.org/10.3390/app131910677
Journal volume & issue: Vol. 13, no. 19
p. 10677

Abstract

Read online

Stereo cameras allow mobile robots to perceive depth in their surroundings by capturing two separate images from slightly different perspectives. This is necessary for tasks such as obstacle avoidance, navigation, and spatial mapping. By utilizing a convolutional neural network (CNN), existing works in stereo cameras based on depth estimation have achieved superior results. However, the critical requirement for depth estimation for mobile robots is to have an optimal tradeoff between computational cost and accuracy. To achieve such a tradeoff, attention-aware feature aggregation (AAFS) has been proposed for real-time stereo matching on edge devices. AAFS includes multistage feature extraction, an attention module, and a 3D CNN architecture. However, its 3D CNN architecture learns contextual information ineffectively. In this paper, a deep encoder–decoder architecture is applied to an AAFS 3D CNN to improve depth estimation accuracy. Through evaluation, it is proven that the proposed 3D CNN architecture provides significantly better accuracy while keeping the inference time comparable to that of AAFS.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords