智慧农业 (Sep 2024)
MSH-YOLOv8: Mushroom Small Object Detection Method with Scale Reconstruction and Fusion
Abstract
[Objective]Traditional object detection algorithms applied in the agricultural field, such as those used for crop growth monitoring and harvesting, often suffer from insufficient accuracy. This is particularly problematic for small crops like mushrooms, where recognition and detection are more challenging. The introduction of small object detection technology promises to address these issues, potentially enhancing the precision, efficiency, and economic benefits of agricultural production management. However, achieving high accuracy in small object detection has remained a significant challenge, especially when dealing with varying image sizes and target scales. Although the YOLO series models excel in speed and large object detection, they still have shortcomings in small object detection. To address the issue of maintaining high accuracy amid changes in image size and target scale, a novel detection model, Multi-Strategy Handling YOLOv8 (MSH-YOLOv8), was proposed.[Methods]The proposed MSH-YOLOv8 model builds upon YOLOv8 by incorporating several key enhancements aimed at improving sensitivity to small-scale targets and overall detection performance. Firstly, an additional detection head was added to increase the model's sensitivity to small objects. To address computational redundancy and improve feature extraction, the Swin Transformer detection structure was introduced into the input module of the head network, creating what was termed the "Swin Head (SH)". Moreover, the model integrated the C2f_Deformable convolutionv4 (C2f_DCNv4) structure, which included deformable convolutions, and the Swin Transformer encoder structure, termed "Swinstage", to reconstruct the YOLOv8 backbone network. This optimization enhanced feature propagation and extraction capabilities, increasing the network's ability to handle targets with significant scale variations. Additionally, the normalization-based attention module (NAM) was employed to improve performance without compromising detection speed or computational complexity. To further enhance training efficacy and convergence speed, the original loss function CIoU was replaced with wise-intersection over union (WIoU) Loss. Furthermore, experiments were conducted using mushrooms as the research subject on the open Fungi dataset. Approximately 200 images with resolution sizes around 600×800 were selected as the main research material, along with 50 images each with resolution sizes around 200×400 and 1 000×1 200 to ensure representativeness and generalization of image sizes. During the data augmentation phase, a generative adversarial network (GAN) was utilized for resolution reconstruction of low-resolution images, thereby preserving semantic quality as much as possible. In the post-processing phase, dynamic resolution training, multi-scale testing, soft non-maximum suppression (Soft-NMS), and weighted boxes fusion (WBF) were applied to enhance the model's small object detection capabilities under varying scales.[Results and Discussions]The improved MSH-YOLOv8 achieved an average precision at 50% (AP50) intersection over union of 98.49% and an AP@50-95 of 75.29%, with the small object detection metric APs reaching 39.73%. Compared to mainstream models like YOLOv8, these metrics showed improvements of 2.34%, 4.06% and 8.55%, respectively. When compared to the advanced TPH-YOLOv5 model, the improvements were 2.14%, 2.76% and 6.89%, respectively. The ensemble model, MSH-YOLOv8-ensemble, showed even more significant improvements, with AP50 and APs reaching 99.14% and 40.59%, respectively, an increase of 4.06% and 8.55% over YOLOv8. These results indicate the robustness and enhanced performance of the MSH-YOLOv8 model, particularly in detecting small objects under varying conditions. Further application of this methodology on the Alibaba Cloud Tianchi databases "Tomato Detection" and "Apple Detection" yielded MSH-YOLOv8-t and MSH-YOLOv8-a models (collectively referred to as MSH-YOLOv8). Visual comparison of detection results demonstrated that MSH-YOLOv8 significantly improved the recognition of dense and blurry small-scale tomatoes and apples. This indicated that the MSH-YOLOv8 method possesses strong cross-dataset generalization capability and effectively recognizes small-scale targets. In addition to quantitative improvements, qualitative assessments showed that the MSH-YOLOv8 model could handle complex scenarios involving occlusions, varying lighting conditions, and different growth stages of the crops. This demonstrates the practical applicability of the model in real-world agricultural settings, where such challenges are common.[Conclusions]The MSH-YOLOv8 improvement method proposed in this study effectively enhances the detection accuracy of small mushroom targets under varying image sizes and target scales. This approach leverages multiple strategies to optimize both the architecture and the training process, resulting in a robust model capable of high-precision small object detection. The methodology's application to other datasets, such as those for tomato and apple detection, further underscores its generalizability and potential for broader use in agricultural monitoring and management tasks.
Keywords