Heliyon (Jan 2024)
YOLO-SK: A lightweight multiscale object detection algorithm
Abstract
YOLOv5 is an excellent object-detection model. However, it fails to fully use multiscale information when detecting objects with significant scale variations. It might use irrelevant contextual information, leading to incorrect predictions, particularly for low-performance devices. In this study, we selected lightweight YOLOv5s as the baseline model and proposed an improved model called YOLO-SK to overcome this limitation. YOLO-SK introduced several key improvements, the most important being the collaborative work of the weighted dense feature fusion network and SK attention prediction head. The proposed weighted dense feature fusion network could dynamically fuse features at different scales using autonomous learning parameters and cross-layer fusion capabilities. This enabled a balanced feature fusion ability in the output feature maps of different scales, thereby enhancing the richness of the effective information in the fused feature maps. The prediction head equipped with the SK attention mechanism broadened the scope of the model's receptive field and sharpened the focus on the target characteristics. This made it possible to glean more information about the target from the feature map output by employing a weighted dense feature fusion network. In addition, in order to improve the model's performance in terms of both accuracy and volume, we implemented the SIoU loss function and the Ghost Conv. The use of the model allowed for a more precise and in-depth comprehension of the event, which was made possible by all of these various methods of improvement. Extensive testing done on the PASCAL VOC 2007 and 2012 datasets showed that YOLO-SK was able to achieve considerable gains in prediction accuracy when compared with the baseline model (YOLOv5s), all while keeping the same level of model complexity. To be more specific, [email protected] increased by 2.6 %, and [email protected]:.95 increased by 4.8 %. The advancements that were made and detailed in this paper could serve as a springboard for additional research that aims to improve the precision of multiscale object identification models for low-performance devices.