IEEE Access (Jan 2024)

RGB-Based Set Prediction Transformer of 6D Pose Estimation for Robotic Grasping Application

  • Xiao Ning,
  • Beining Yang,
  • Si Huang,
  • Zhenzhe Zhang,
  • Binhui Pan

DOI
https://doi.org/10.1109/ACCESS.2024.3462970
Journal volume & issue
Vol. 12
pp. 138047 – 138060

Abstract

Read online

Precise pose estimation of textureless objects from RGB images without the use of depth information remains a significant challenge in computer vision. This paper introduces an RGB-based method for 6D pose estimation, designed for robotic grasping in 6 degrees of freedom. We proposed a novel network named the Region-based Keypoint Detection Transformer, which leverages the concept of set prediction. This network generates numerous set points, establishing a fuzzy representation of keypoints. These set points are interpreted as keypoints using a voting mechanism for PnP-based 6D pose solving and refinement, guiding robotic precise grasping actions. To facilitate the training of set points, specialized losses are introduced, including a set-based coordinate loss and a soft index probability loss for multi-label classification. Evaluated on the LineMOD dataset, our method achieved an ADD(-S) accuracy of 92.92 % and a 2D projection accuracy of 98.72 %. Additionally, practical robotic grasping experiments were conducted in real-world environments, covering both static and dynamic grasping, as well as bin-picking tasks. The picking success rate and completion rate reached 93.02 % and 100.00 % respectively, with the success rate surpassing the optimal baseline by 2.11 %. These results demonstrate the accuracy and robustness of our method in cluttered environments. The code is available at https://github.com/Nx1021/RKDT.

Keywords