Sensors (Oct 2023)

Adaptation of YOLOv7 and YOLOv7_tiny for Soccer-Ball Multi-Detection with DeepSORT for Tracking by Semi-Supervised System

  • Jorge Armando Vicente-Martínez,
  • Moisés Márquez-Olivera,
  • Abraham García-Aliaga,
  • Viridiana Hernández-Herrera

DOI
https://doi.org/10.3390/s23218693
Journal volume & issue
Vol. 23, no. 21
p. 8693

Abstract

Read online

Object recognition and tracking have long been a challenge, drawing considerable attention from analysts and researchers, particularly in the realm of sports, where it plays a pivotal role in refining trajectory analysis. This study introduces a different approach, advancing the detection and tracking of soccer balls through the implementation of a semi-supervised network. Leveraging the YOLOv7 convolutional neural network, and incorporating the focal loss function, the proposed framework achieves a remarkable 95% accuracy in ball detection. This strategy outperforms previous methodologies researched in the bibliography. The integration of focal loss brings a distinctive edge to the model, improving the detection challenge for soccer balls on different fields. This pivotal modification, in tandem with the utilization of the YOLOv7 architecture, results in a marked improvement in accuracy. Following the attainment of this result, the implementation of DeepSORT enriches the study by enabling precise trajectory tracking. In the comparative analysis between versions, the efficacy of this approach is underscored, demonstrating its superiority over conventional methods with default loss function. In the Materials and Methods section, a meticulously curated dataset of soccer balls is assembled. Combining images sourced from freely available digital media with additional images from training sessions and amateur matches taken by ourselves, the dataset contains a total of 6331 images. This diverse dataset enables comprehensive testing, providing a solid foundation for evaluating the model’s performance under varying conditions, which is divided by 5731 images for supervised system and the last 600 images for semi-supervised. The results are striking, with an accuracy increase to 95% with the focal loss function. The visual representations of real-world scenarios underscore the model’s proficiency in both detection and classification tasks, further affirming its effectiveness, the impact, and the innovative approach. In the discussion, the hardware specifications employed are also touched on, any encountered errors are highlighted, and promising avenues for future research are outlined.

Keywords