Multiple object tracking based on multi‐task learning with strip attention

Yaoye Song; Peng Zhang; Wei Huang; Yufei Zha; Tao You; Yanning Zhang

doi:10.1049/ipr2.12327

IET Image Processing (Dec 2021)

Multiple object tracking based on multi‐task learning with strip attention

Yaoye Song,
Peng Zhang,
Wei Huang,
Yufei Zha,
Tao You,
Yanning Zhang

Affiliations

Yaoye Song: School of Computer Science Northwestern Polytechnical University Department Xi'an P.R. China
Peng Zhang: School of Computer Science Northwestern Polytechnical University Department Xi'an P.R. China
Wei Huang: School of Information Engineering Nanchang University Nanchang P.R. China
Yufei Zha: School of Computer Science Northwestern Polytechnical University Department Xi'an P.R. China
Tao You: School of Computer Science Northwestern Polytechnical University Department Xi'an P.R. China
Yanning Zhang: School of Computer Science Northwestern Polytechnical University Department Xi'an P.R. China

DOI: https://doi.org/10.1049/ipr2.12327
Journal volume & issue: Vol. 15, no. 14
pp. 3661 – 3673

Abstract

Read online

Abstract Multiple object tracking (MOT) framework based on bifurcate strategy was usually challenged by data association of different model path, which work for object localisation and appearance embedding independently. By incorporating the re‐identification (re‐ID) as appearance embedding model, more recent studies on task combination of a single network have made a great progress in tracking performance. Unfortunately, the contributive improvement from re‐ID model is hard to balance the accuracy and efficiency for the whole framework. For more effective enhancement of the overall tracking performance, a real‐time detection needs to be taken into consideration with other auxiliary means for MOT modelling. Therefore, in this study, a one‐shot multiple object tracking is proposed based on multi‐task learning to obtain satisfactory performance in both speed and robustness. With updated re‐training strategy for the backbone model of detection, a D2LA network is proposed to achieve more characteristic fine‐grained feature extraction in branching task of pedestrian recognition. Additionally, a strip attention module is also introduced to further strengthen the feature discriminative capability of the tracking framework in occlusion. Experiments on the 2DMOT15, MOT16, MOT17, and MOT20 benchmark data sets have shown a superior performance in comparison to other state‐of‐the‐art tracking approaches.

Published in IET Image Processing

ISSN: 1751-9659 (Print); 1751-9667 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Photography; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519667

About the journal