IEEE Access (Jan 2024)

An In-Field Dynamic Vision-Based Analysis for Vineyard Yield Estimation

  • David Ahmedt-Aristizabal,
  • Daniel Smith,
  • Muhammad Rizwan Khokher,
  • Xun Li,
  • Adam L. Smith,
  • Lars Petersson,
  • Vivien Rolland,
  • Everard J. Edwards

DOI
https://doi.org/10.1109/ACCESS.2024.3431244
Journal volume & issue
Vol. 12
pp. 102146 – 102166

Abstract

Read online

Accurately predicting grape yield in vineyards is essential for strategic decision-making in the wine industry. Current methods are labour-intensive, costly, and lack spatial coverage, reducing accuracy and cost-efficiency. Efforts to automate and enhance yield estimation focus on scaling fruit weight assessments. Machine learning, particularly deep learning, shows promise in improving accuracy through automatic feature extraction and hierarchical representation. However, most of these methods have been developed for analyses at a particular time point and solutions able to consider temporal information captured across sequential frames are currently poorly developed. This paper addresses this gap by introducing a system for yield estimation, utilising publicly available data repositories, such as Embrapa WGISD, alongside an in-house dataset collected by a Blackmagic camera at the pre-harvest stage. We introduce a system that utilises bunch weight regression to estimate grape yield. Bunch weight estimates are obtained by summing samples randomly drawn from the grape bunch weight distribution through empirical calibration. Grapevine bunches are identified and segmented using Mask R-CNN with Swin Transformer, and a SiamFC-based tracking mechanism is employed to estimate the number of unique bunches per panel or row. The number of berries for each tracked bunch is determined using a density approach known as multitask point supervision. In our experiments, we demonstrate the effectiveness of the proposed system for yield estimation, achieving harvested weight errors of less than 5% in two of the three vineyard panels. Larger harvest weight errors (around 15%) were observed due to inaccuracies in tracking certain bunches caused by dense concentration of bunches in one panel. However, these errors should be contrasted with the current practice error of up to 30%, highlighting the potential of machine vision for hands-off yield estimation at scale.

Keywords