End-to-end multiple object tracking in high-resolution optical sensors of drones with transformer models

Yubin Yuan; Yiquan Wu; Langyue Zhao; Yuqi Liu; Yaxuan Pang

doi:10.1038/s41598-024-75934-9

Scientific Reports (Oct 2024)

End-to-end multiple object tracking in high-resolution optical sensors of drones with transformer models

Yubin Yuan,
Yiquan Wu,
Langyue Zhao,
Yuqi Liu,
Yaxuan Pang

Affiliations

Yubin Yuan: College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics
Yiquan Wu: College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics
Langyue Zhao: College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics
Yuqi Liu: College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics
Yaxuan Pang: College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics

DOI: https://doi.org/10.1038/s41598-024-75934-9
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Drone aerial imaging has become increasingly important across numerous fields as drone optical sensor technology continues to advance. One critical challenge in this domain is achieving both accurate and efficient multi-object tracking. Traditional deep learning methods often separate object identification from tracking, leading to increased complexity and potential performance degradation. Conventional approaches rely heavily on manual feature engineering and intricate algorithms, which can further limit efficiency. To overcome these limitations, we propose a novel Transformer-based end-to-end multi-object tracking framework. This innovative method leverages self-attention mechanisms to capture complex inter-object relationships, seamlessly integrating object detection and tracking into a unified process. By utilizing end-to-end training, our approach simplifies the tracking pipeline, leading to significant performance improvements. A key innovation in our system is the introduction of a trajectory detection label matching technique. This technique assigns labels based on a comprehensive assessment of object appearance, spatial characteristics, and Gaussian features, ensuring more precise and logical label assignments. Additionally, we incorporate cross-frame self-attention mechanisms to extract long-term object properties, providing robust information for stable and consistent tracking. We further enhance tracking performance through a newly developed self-characteristics module, which extracts semantic features from trajectory information across both current and previous frames. This module ensures that the long-term interaction modules maintain semantic consistency, allowing for more accurate and continuous tracking over time. The refined data and stored trajectories are then used as input for subsequent frame processing, creating a feedback loop that sustains tracking accuracy. Extensive experiments conducted on the VisDrone and UAVDT datasets demonstrate the superior performance of our approach in drone-based multi-object tracking.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords