Agronomy (Dec 2024)
TomatoPoseNet: An Efficient Keypoint-Based 6D Pose Estimation Model for Non-Destructive Tomato Harvesting
Abstract
The non-destructive harvesting of fresh tomatoes with agricultural robots requires the robotic arm to approach the fruit with the correct posture to ensure successful harvesting. However, this process faces significant challenges due to the small size of fruit pedicels, cluttered environments, and varied poses of the tomatoes and pedicels. Accurately identifying, localizing, and estimating the 6D spatial pose of the cutting points is critical for efficient and non-destructive harvesting. To address these challenges, we propose a keypoint-based pose estimation model, TomatoPoseNet, tailored to meet the agronomic requirements of tomato harvesting. The model integrates an efficient fusion block (EFBlock) based on the CSPLayer, referred to as the CSEFLayer, as the backbone network, designed to fuse multiscale features while maintaining efficient computational resource usage. Next, a parallel deep fusion network (PDFN) is utilized as the neck network to integrate features from multiple parallel branches. Furthermore, simple coordinate classification (SimCC) is employed as the head network for keypoint detection, and a StripPooling block is introduced to enhance the model’s ability to capture features of different scales and shapes by applying strip pooling in horizontal and vertical directions. Finally, a geometric model is constructed based on the information about the predicted 3D keypoints to estimate the 6D pose of the cutting points. The results show the following: (1) The average precision for keypoint detection ([email protected]) reached 82.51%, surpassing those of ViTPose, HRNet, Lite-HRNet, Hourglass, and RTMPose by 3.78%, 9.46%, 11%, 9.14%, and 10.07%, respectively. (2) The mean absolute errors (MAEs) of the yaw and pitch angles for 6D pose estimation of the cutting points were 2.98° and 3.54°, respectively, with maximum errors within 6.5°, meeting the requirements for harvesting. The experimental results demonstrate that the proposed method can accurately locate the 6D pose of cutting points in an unstructured tomato harvesting environment, enabling non-destructive harvesting.
Keywords