Motion estimation and multi-stage association for tracking-by-detection

Ye Li; Lei Wu; Yiping Chen; Xinzhong Wang; Guangqiang Yin; Zhiguo Wang

doi:10.1007/s40747-023-01273-3

Complex & Intelligent Systems (Nov 2023)

Motion estimation and multi-stage association for tracking-by-detection

Ye Li,
Lei Wu,
Yiping Chen,
Xinzhong Wang,
Guangqiang Yin,
Zhiguo Wang

Affiliations

Ye Li: Shenzhen Institute of Information Technology
Lei Wu: University of Electronic Science and Technology of China
Yiping Chen: University of Electronic Science and Technology of China
Xinzhong Wang: Shenzhen Institute of Information Technology
Guangqiang Yin: University of Electronic Science and Technology of China
Zhiguo Wang: University of Electronic Science and Technology of China

DOI: https://doi.org/10.1007/s40747-023-01273-3
Journal volume & issue: Vol. 10, no. 2
pp. 2445 – 2458

Abstract

Read online

Abstract Multi-object tracking (MOT) aims to locate and identify objects in videos. As deep learning brings excellent performances to object detection, the tracking-by-detection (TBD) has gradually become a mainstream tracking framework. However, some drawbacks still exist in the current TBD framework: (1) inaccurate prediction of the bounding boxes would occur in the detection part, which is caused by overlooking the actual pedestrian ratio in the surveillance scene. (2) The width of the bounding boxes in the next frame might be indirectly predicted by the aspect ratio, which increases the error of width prediction in the motion prediction part. (3) Association is only performed for high-confidence detection boxes, and the low-confidence boxes caused by occlusion are discarded in the data association part, resulting in fragmentation of trajectories. To address the above issues, we propose a multi-target tracking model incorporating motion estimation and multi-stage association (MEMA). First, the aspect ratio of the ground-true bounding box is introduced to improve the fit of the detection and the ground-true bounding box, and we design the elliptical Gaussian kernel to improve the positioning accuracy of the object center point. Then, the prediction state vector of the Kalman filter is modified to predict the width and its corresponding velocity directly. It can reduce the width error of the prediction box and eliminate the velocity error of the motion estimation, which leads to a more pedestrian-friendly prediction bounding box. Finally, we propose a multi-stage association strategy to correlate different confidence boxes. Without using the appearance feature, the strategy can reduce the impact of occlusion and improve the tracking performance. On the MOT17 test set, the method proposed in this paper achieves a MOTA of 74.3% and an IDF1 of 72.4%, outperforming the current SOTA.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords