ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences (Nov 2024)

Real-Time Leaves Segmentation in RGB Images with Deep Learning in a Single-Board Computer

  • C. D. S. Faria Júnior,
  • C. D. S. Faria Júnior,
  • M. H. Shimabukuro,
  • A. M. G. Tommaselli,
  • M. R. O. D. A. Maximo,
  • L. R. Porto,
  • N. N. Imai

DOI
https://doi.org/10.5194/isprs-annals-X-3-2024-139-2024
Journal volume & issue
Vol. X-3-2024
pp. 139 – 146

Abstract

Read online

This work proposed and evaluated methods for real-time leaf segmentation using a single-board computer. The main aim was to explore the state-of-the-art techniques based on the YOLO algorithm for real-time operation. For this purpose, the available variants of YOLOv8 and YOLOv9 were evaluated, and a semi-automatic labelling method based on the Segment Anything Model (SAM) algorithm was used. Given the need to delimit the leaf contour for labelling, it was possible to create a larger and more accurate dataset compared to the purely manual procedure. In addition, the cost-benefit of the applied algorithms and methods were assessed, considering the computational demand required, as well as the accuracy, recall, and precision delivered by these techniques. In this study, both quantitative analysis of the trained architectures' metrics and qualitative examination through direct observation of images were conducted to identify crucial aspects. The experiments were conducted with a post-processed dataset and the suitability for real-time applications was based on the elapsed time for segmentation. We concluded that the YOLOv8n architecture is the best one among those tested, presenting a precision and recall of 0.9064 and 0.7233, respectively. This architecture represents the best cost-benefit ratio between computational cost and real-time performance, being able to perform segmentation in 310 ms with the NVIDIA Jetson Nano board. Furthermore, when computational cost is not a problem or even when segmentation time can be higher, the YOLO8m network may be recommended when the recall metric is more important than precision. This network presented a precision and recall of 0.8556 and 0.7726, respectively, and presented a better performance in segmenting leaves located in more complex parts of the image and with a higher recall.