Apple Detection in Complex Scene Using the Improved YOLOv4 Model

Lin Wu; Jie Ma; Yuehua Zhao; Hong Liu

doi:10.3390/agronomy11030476

Agronomy (Mar 2021)

Apple Detection in Complex Scene Using the Improved YOLOv4 Model

Lin Wu,
Jie Ma,
Yuehua Zhao,
Hong Liu

Affiliations

Lin Wu: School of Electronics and Information Engineering, Hebei University of Technology, Tianjin 300401, China
Jie Ma: School of Electronics and Information Engineering, Hebei University of Technology, Tianjin 300401, China
Yuehua Zhao: School of Electronics and Information Engineering, Hebei University of Technology, Tianjin 300401, China
Hong Liu: School of Electronics and Information Engineering, Hebei University of Technology, Tianjin 300401, China

DOI: https://doi.org/10.3390/agronomy11030476
Journal volume & issue: Vol. 11, no. 3
p. 476

Abstract

Read online

To enable the apple picking robot to quickly and accurately detect apples under the complex background in orchards, we propose an improved You Only Look Once version 4 (YOLOv4) model and data augmentation methods. Firstly, the crawler technology is utilized to collect pertinent apple images from the Internet for labeling. For the problem of insufficient image data caused by the random occlusion between leaves, in addition to traditional data augmentation techniques, a leaf illustration data augmentation method is proposed in this paper to accomplish data augmentation. Secondly, due to the large size and calculation of the YOLOv4 model, the backbone network Cross Stage Partial Darknet53 (CSPDarknet53) of the YOLOv4 model is replaced by EfficientNet, and convolution layer (Conv2D) is added to the three outputs to further adjust and extract the features, which make the model lighter and reduce the computational complexity. Finally, the apple detection experiment is performed on 2670 expanded samples. The test results show that the EfficientNet-B0-YOLOv4 model proposed in this paper has better detection performance than YOLOv3, YOLOv4, and Faster R-CNN with ResNet, which are state-of-the-art apple detection model. The average values of Recall, Precision, and F1 reach 97.43%, 95.52%, and 96.54% respectively, the average detection time per frame of the model is 0.338 s, which proves that the proposed method can be well applied in the vision system of picking robots in the apple industry.

Published in Agronomy

ISSN: 2073-4395 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Agriculture
Website: http://www.mdpi.com/journal/agronomy

About the journal

Abstract

Keywords