Applied Sciences (Jun 2024)
Robot Grasp Detection with Loss-Guided Collaborative Attention Mechanism and Multi-Scale Feature Fusion
Abstract
Grasp detection serves as the fundamental element for achieving successful grasping in robotic systems. The encoder–decoder structure has become widely adopted as the foundational architecture for grasp detection networks due to its inherent advantages of speed and accuracy. However, traditional network structures fail to effectively extract the essential features required for accurate grasping poses and neglect to eliminate the checkerboard artifacts caused by inversion convolution during decoding. Aiming at overcoming these challenges, we propose a novel generative grasp detection network (LGAR-Net2). A transposed convolution layer is employed to replace the bilinear interpolation layer in the decoder to remove the issue of uneven overlapping and consequently eliminate checkerboard artifacts. In addition, a loss-guided collaborative attention block (LGCA), which combines attention blocks with spatial pyramid blocks to enhance the attention to important regions of the image, is constructed to enhance the accuracy of information extraction. Validated on the Cornell public dataset using RGB images as the input, LGAR-Net2 achieves an accuracy of 97.7%, an improvement of 1.1% over the baseline network, and processes a single RGB image in just 15 ms.
Keywords