Object Pose Estimation Using Color Images and Predicted Depth Maps

Dinh-Cuong Hoang; Phan Xuan Tan; Anh-Nhat Nguyen; Duy-Quang Vu; van-Duc Vu; Thu-Uyen Nguyen; Quang-Tri Duong; van-Thiep Nguyen; Ngoc-Anh Hoang; Khanh-Toan Phan; Duc-Thanh Tran; Ngoc-Trung Ho; Cong-Trinh Tran; van-Hiep Duong; Phuc-Quan Ngo

doi:10.1109/ACCESS.2024.3397715

IEEE Access (Jan 2024)

Object Pose Estimation Using Color Images and Predicted Depth Maps

Dinh-Cuong Hoang,
Phan Xuan Tan,
Anh-Nhat Nguyen,
Duy-Quang Vu,
van-Duc Vu,
Thu-Uyen Nguyen,
Quang-Tri Duong,
van-Thiep Nguyen,
Ngoc-Anh Hoang,
Khanh-Toan Phan,
Duc-Thanh Tran,
Ngoc-Trung Ho,
Cong-Trinh Tran,
van-Hiep Duong,
Phuc-Quan Ngo

Affiliations

Dinh-Cuong Hoang: ORCiD; Greenwich Vietnam, FPT University, Hanoi, Vietnam
Phan Xuan Tan: ORCiD; College of Engineering, Shibaura Institute of Technology, Tokyo, Japan
Anh-Nhat Nguyen: ORCiD; IT Department, FPT University, Hanoi, Vietnam
Duy-Quang Vu: ORCiD; IT Department, FPT University, Hanoi, Vietnam
van-Duc Vu: ORCiD; IT Department, FPT University, Hanoi, Vietnam
Thu-Uyen Nguyen: ORCiD; IT Department, FPT University, Hanoi, Vietnam
Quang-Tri Duong: ORCiD; Greenwich Vietnam, FPT University, Hanoi, Vietnam
van-Thiep Nguyen: ORCiD; IT Department, FPT University, Hanoi, Vietnam
Ngoc-Anh Hoang: ORCiD; IT Department, FPT University, Hanoi, Vietnam
Khanh-Toan Phan: ORCiD; IT Department, FPT University, Hanoi, Vietnam
Duc-Thanh Tran: ORCiD; IT Department, FPT University, Hanoi, Vietnam
Ngoc-Trung Ho: ORCiD; IT Department, FPT University, Hanoi, Vietnam
Cong-Trinh Tran: ORCiD; IT Department, FPT University, Hanoi, Vietnam
van-Hiep Duong: ORCiD; IT Department, FPT University, Hanoi, Vietnam
Phuc-Quan Ngo: ORCiD; Greenwich Vietnam, FPT University, Hanoi, Vietnam

DOI: https://doi.org/10.1109/ACCESS.2024.3397715
Journal volume & issue: Vol. 12
pp. 65444 – 65461

Abstract

Read online

The task of object pose estimation in computer vision heavily relies on both color (RGB) and depth (D) images to provide crucial appearance and geometric information, assisting algorithms in understanding occlusions and object geometry, thereby enhancing accuracy. However, the dependency on specialized sensors capable of capturing depth poses challenges in terms of cost and availability. Consequently, researchers are exploring methods to estimate object poses solely from RGB images. Nevertheless, this approach encounters difficulties in handling occlusions, discerning object geometry, and resolving ambiguities arising from similar color or texture patterns. This paper introduces a novel geometry-aware method for object pose estimation utilizing RGB images as input to determine the poses of multiple object instances. Our approach leverages both depth and color images during training but only relies on color images during inference. Departing from traditional depth sensors, our method computes predicted point clouds directly from estimated depth images derived from RGB inputs. A key innovation lies in the formulation of a multi-scale fusion module adept at seamlessly integrating features extracted from RGB images with those inferred from the predicted point clouds. This fusion process significantly fortifies the pose estimation pipeline by harnessing the strengths of both modalities, resulting in notably improved object poses. Extensive experimentation demonstrates that our approach markedly outperforms state-of-the-art RGB-based methods on Occluded-LINEMOD and YCB-Video datasets. Moreover, our method achieves competitive results compared to RGB-D approaches that necessitate both RGB and depth data from physical sensors.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords