Agriculture (Jun 2024)

YOLOv8n-DDA-SAM: Accurate Cutting-Point Estimation for Robotic Cherry-Tomato Harvesting

  • Gengming Zhang,
  • Hao Cao,
  • Yangwen Jin,
  • Yi Zhong,
  • Anbang Zhao,
  • Xiangjun Zou,
  • Hongjun Wang

DOI
https://doi.org/10.3390/agriculture14071011
Journal volume & issue
Vol. 14, no. 7
p. 1011

Abstract

Read online

Accurately identifying cherry-tomato picking points and obtaining their coordinate locations is critical to the success of cherry-tomato picking robots. However, previous methods for semantic segmentation alone or combining object detection with traditional image processing have struggled to accurately determine the cherry-tomato picking point due to challenges such as leaves as well as targets that are too small. In this study, we propose a YOLOv8n-DDA-SAM model that adds a semantic segmentation branch to target detection to achieve the desired detection and compute the picking point. To be specific, YOLOv8n is used as the initial model, and a dynamic snake convolutional layer (DySnakeConv) that is more suitable for the detection of the stems of cherry-tomato is used in neck of the model. In addition, the dynamic large convolutional kernel attention mechanism adopted in backbone and the use of ADown convolution resulted in a better fusion of the stem features with the neck features and a certain decrease in the number of model parameters without loss of accuracy. Combined with semantic branch SAM, the mask of picking points is effectively obtained and then the accurate picking point is obtained by simple shape-centering calculation. As suggested by the experimental results, the proposed YOLOv8n-DDA-SAM model is significantly improved from previous models not only in detecting stems but also in obtaining stem’s masks. In the [email protected] and F1-score, the YOLOv8n-DDA-SAM achieved 85.90% and 86.13% respectively. Compared with the original YOLOv8n, YOLOv7, RT-DETR-l and YOLOv9c, the [email protected] has improved by 24.7%, 21.85%, 19.76%, 15.99% respectively. F1-score has increased by 16.34%, 12.11%, 10.09%, 8.07% respectively, and the number of parameters is only 6.37M. In the semantic segmentation branch, not only does it not need to produce relevant datasets, but also improved its mIOU by 11.43%, 6.94%, 5.53%, 4.22% and [email protected] by 12.33%, 7.49%, 6.4%, 5.99% compared to Deeplabv3+, Mask2former, DDRNet and SAN respectively. In summary, the model can well satisfy the requirements of high-precision detection and provides a strategy for the detection system of the cherry-tomato.

Keywords