IEEE Access (Jan 2024)

Multi-Modal Robust Geometry Primitive Shape Scene Abstraction for Grasp Detection

  • Hamed Hosseini,
  • Mohammadhossein Koosheshi,
  • Mehdi Tale Masouleh,
  • Ahmad Kalhor

DOI
https://doi.org/10.1109/ACCESS.2024.3458800
Journal volume & issue
Vol. 12
pp. 130117 – 130134

Abstract

Read online

Scene understanding is essential for a wide range of robotic tasks, such as grasping. Simplifying the scene into predefined forms makes the robot perform the robotic task more properly, especially in an unknown environment. This paper proposes a combination of simulation-based and real-world datasets for domain adaptation purposes and grasping in practical settings. In order to compensate for the weakness of depth images in previous studies reported in the literature for clearly representing boundaries, the RGB image has also been fed as input in RGB and RGB-D input modalities. The implemented architecture is based on the Mask R-CNN network with a backbone of ResNet101. By using RGB and RGB-D images as input, the proposed approach has thus improved the segmentation Dice score over primitive shape abstraction by 3.73% and 6.19%, respectively. Moreover, in order to improve and evaluate the robustness of the model to occlusion and a variety of primitive shapes and colors that may occur in the scene, different versions of simulation-based datasets are generated using the Coppeliasim simulator. Additionally, a real-world primitive shape abstraction dataset is created to make the model more robust in more complex objects and real-world experiments. To further generalize the model to apply to a wider range of objects, new primitive shapes, such as cones, and both filled and hollow types of each primitive shape, are considered. Subsequently, the point clouds of the segmented parts are generated, and the ICP algorithm is used to derive the 6-DOF grasp parameters using reference primitive shapes and their predefined grasps. Simulation experiments result in a 95% grasp success rate using the Coppeliamsim simulation environment on unseen objects. A Delta parallel robot and a 2-fingered fabricated gripper are used for practical experiments. These experiments yielded a 98% grasp success rate on common objects used in baseline evaluations, outscoring the state-of-the-art by 2%. Real-world tests also include scenes with multiple objects and cluttered scenes.

Keywords