IEEE Access (Jan 2025)
Real-Time Long-Range Object Tracking Based on Ensembled Model
Abstract
Accurate Object distance estimation with recognition is essential for various computer vision (CV) applications, including autonomous vehicles and many military operations. While significant advancements have been achieved in supervised and self-supervised techniques for short-range and real-time recognition, existing methods often focus on monocular depth estimation. They are constrained by the limitations of supervised deep learning (DL) models. Taking advantage of temporal information from sequential frames through attention mechanisms to address these challenges offers a promising avenue for enhancing recognition quality. For a real-time military object recognition system, this study integrates RGB images along Depth maps from the KITTI Dataset for short-range measurements. Due to the limitation of the KITTI dataset, a synthetic dataset is generated for long-range object recognition. YOLOv8 is trained for real-time object detection, utilizing the KITTI and a synthetic dataset for long-range analysis. Our method achieves an RMSE of 1.24 meters and an RMSE(log) of 0.18 for depth estimation, outperforming existing approaches. Furthermore, the system accurately detects objects at distances of up to 250 meters, with an average inference time of 15 ms per frame for short-range detection and 18 ms per frame for long-range detection. Comparative evaluations against state-of-the-art methods demonstrate our approach’s superior accuracy and efficiency.
Keywords