Occlusion-Robust Pallet Pose Estimation for Warehouse Automation

Van-Duc Vu; Dinh-Dai Hoang; Phan Xuan Tan; Van-Thiep Nguyen; Thu-Uyen Nguyen; Ngoc-Anh Hoang; Khanh-Toan Phan; Duc-Thanh Tran; Duy-Quang Vu; Phuc-Quan Ngo; Quang-Tri Duong; Anh-Nhat Nguyen; Dinh-Cuong Hoang

doi:10.1109/ACCESS.2023.3348781

IEEE Access (Jan 2024)

Occlusion-Robust Pallet Pose Estimation for Warehouse Automation

Van-Duc Vu,
Dinh-Dai Hoang,
Phan Xuan Tan,
Van-Thiep Nguyen,
Thu-Uyen Nguyen,
Ngoc-Anh Hoang,
Khanh-Toan Phan,
Duc-Thanh Tran,
Duy-Quang Vu,
Phuc-Quan Ngo,
Quang-Tri Duong,
Anh-Nhat Nguyen,
Dinh-Cuong Hoang

Affiliations

Van-Duc Vu: ICT Department, FPT University, Hanoi, Vietnam
Dinh-Dai Hoang: Toyohashi University of Technology, Toyohashi, Japan
Phan Xuan Tan: ORCiD; College of Engineering, Shibaura Institute of Technology, Tokyo, Japan
Van-Thiep Nguyen: ICT Department, FPT University, Hanoi, Vietnam
Thu-Uyen Nguyen: ICT Department, FPT University, Hanoi, Vietnam
Ngoc-Anh Hoang: ICT Department, FPT University, Hanoi, Vietnam
Khanh-Toan Phan: ICT Department, FPT University, Hanoi, Vietnam
Duc-Thanh Tran: ICT Department, FPT University, Hanoi, Vietnam
Duy-Quang Vu: ORCiD; ICT Department, FPT University, Hanoi, Vietnam
Phuc-Quan Ngo: ICT Department, FPT University, Hanoi, Vietnam
Quang-Tri Duong: ICT Department, FPT University, Hanoi, Vietnam
Anh-Nhat Nguyen: ICT Department, FPT University, Hanoi, Vietnam
Dinh-Cuong Hoang: ORCiD; ICT Department, FPT University, Hanoi, Vietnam

DOI: https://doi.org/10.1109/ACCESS.2023.3348781
Journal volume & issue: Vol. 12
pp. 1927 – 1942

Abstract

Read online

Accurate detection and estimation of pallet poses from color and depth data (RGB-D) are integral components many in advanced warehouse intelligent systems. State-of-the art object pose estimation methods follow a two-stage process, relying on off-the-shelf segmentation or object detection in the initial stage and subsequently predicting the pose of objects using cropped images. The cropped patches may include both the target object and irrelevant information, such as background or other objects, leading to challenges in handling pallets in warehouse settings with heavy occlusions from loaded objects. In this study, we propose an innovative deep learning-based approach to address the occlusion problem in pallet pose estimation from RGB-D images. Inspired by the selective attention mechanism in human perception, our developed model learns to identify and attenuate the significance of features in occluded regions, focusing on the visible and informative areas for accurate pose estimation. Instead of directly estimating pallet poses from cropped patches as in existing methods, we introduce two feature map re-weighting modules with cross-modal attention. These modules effectively filter out features from occluded regions and background, enhancing pose estimation accuracy. Furthermore, we introduce a large-scale annotated pallet dataset specifically designed to capture occlusion scenarios in warehouse environments, facilitating comprehensive training and evaluation. Experimental results on the newly collected pallet dataset show that our proposed method increases accuracy by 13.5% compared to state-of-the-art methods.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords