Systems Science & Control Engineering (Dec 2024)

Grape cluster detection based on spatial-to-depth convolution and attention mechanism

  • Shuai Rong,
  • Xinghai Kong,
  • Ruibo Gao,
  • Zhiwei Hu,
  • Hua Yang

DOI
https://doi.org/10.1080/21642583.2023.2295949
Journal volume & issue
Vol. 12, no. 1

Abstract

Read online

Grape cluster detection is a crucial step in the visual tasks of automated grape harvesting. Background and occlusion lead to difficulty in detecting grape clusters under natural environments. Some improvements have been proposed to solve this issue. Firstly, the public dataset is enriched with the data augmentation methods of random brightness change, image flip left-right, and mosaic to strengthen the model's robustness. Secondly, based on the problem of information loss in grape cluster detection, a plug-and-play module of spatial-to-depth convolution (STD-Conv) is added to enrich grape cluster feature information. The original grape features are further fused by converting the spatial dimension of the input image into a depth dimension. Thirdly, a simple, parameter-free attention mechanism (SimAM) is applied to the backbone to improve the weight of grape targets and suppress background interference weight in feature extraction. Experiments show that combining STD-Conv and SimAM can improve the accuracy of YOLOv4, YOLOv5, and YOLOX. The improved YOLOX model achieves the highest 88.4% mean Average Precision (mAP), 87.8% precision, and 79.5% recall. These findings demonstrate that the enhanced YOLOX model performs well for grape cluster detection. This study's conclusion makes some valuable ideas for automated harvesting into grape or other fruit detection.

Keywords