IEEE Access (Jan 2024)

Detecting Multi-Scale Rose Apple Skin and Defects Using Instance Segmentation With Anchors Optimization

  • Wilasinee Paewboontra,
  • Nitikarn Nimsuk

DOI
https://doi.org/10.1109/ACCESS.2024.3463733
Journal volume & issue
Vol. 12
pp. 138789 – 138800

Abstract

Read online

Skin defects significantly impact the quality of rose apples. The presence of damaging defects, such as holes, rot, or piercing marks in exported products can negatively impact customer trust. In addition, the products are further graded based on the total area occupied by acceptable defects such as scratches, rubbing, and shallow scars. This study investigated the application of instance segmentation network, specifically single-stage and two-stage approaches, for detecting the areas of rose apple skin and surface lesions. The surface defects were divided into two classes including acceptable and damaging defects. For improving networks capability, anchor boxes were optimized for detecting multi-size rose apple skin and defects. Additionally, the both instance segmentation networks were further adjusted their network size. The results revealed that optimizing anchor boxes based on training data distribution led to significant improvement in detection performance for both single-stage and two-stage networks, compared to K-means clustering approach. In contrast, adjusting the network size had a less pronounced impact. Finally, YOLOv5 with optimized anchor boxes significantly increased the F1-score of detecting defects by 23.98%. However, this trade-off with the lower skin detection F1-score only by 5.41%, compared to Mask R-CNN. Furthermore, YOLOv5 demonstrated greater robustness compared to Mask R-CNN in distinguishing between objects with visual characteristics that resemble defects, such as brightness, shadow, and skin texture. The outcomes provided strong evidence that YOLOv5 offers superior efficiency in detecting multi-scale objects, despite having a network size and inference time that are 1.89 and 1.49 times smaller, respectively, compared to Mask R-CNN.

Keywords