IET Computer Vision (Aug 2024)

A deep learning framework for multi‐object tracking in team sports videos

  • Wei Cao,
  • Xiaoyong Wang,
  • Xianxiang Liu,
  • Yishuai Xu

DOI
https://doi.org/10.1049/cvi2.12266
Journal volume & issue
Vol. 18, no. 5
pp. 574 – 590

Abstract

Read online

Abstract In response to the challenges of Multi‐Object Tracking (MOT) in sports scenes, such as severe occlusions, similar appearances, drastic pose changes, and complex motion patterns, a deep‐learning framework CTGMOT (CNN‐Transformer‐GNN‐based MOT) specifically for multiple athlete tracking in sports videos that performs joint modelling of detection, appearance and motion features is proposed. Firstly, a detection network that combines Convolutional Neural Networks (CNN) and Transformers is constructed to extract both local and global features from images. The fusion of appearance and motion features is achieved through a design of parallel dual‐branch decoders. Secondly, graph models are built using Graph Neural Networks (GNN) to accurately capture the spatio‐temporal correlations between object and trajectory features from inter‐frame and intra‐frame associations. Experimental results on the public sports tracking dataset SportsMOT show that the proposed framework outperforms other state‐of‐the‐art methods for MOT in complex sport scenes. In addition, the proposed framework shows excellent generality on benchmark datasets MOT17 and MOT20.

Keywords