Jisuanji kexue yu tansuo (Nov 2024)

3D Point Cloud Object Tracking Based on Multi-level Fusion of Transformer Features

  • LI Zhijie, LIANG Bowen, DING Xinmiao, GUO Wen

DOI
https://doi.org/10.3778/j.issn.1673-9418.2401071
Journal volume & issue
Vol. 18, no. 11
pp. 3006 – 3014

Abstract

Read online

During the 3D point cloud object tracking, some issues such as occlusion, sparsity, and random noise often arise. To address these challenges, this paper proposes a novel approach to 3D point cloud object tracking based on multi-level fusion of Transformer features. The method mainly consists of the point attention embedding module and the point attention enhancement module, which are used for feature extraction and feature matching processes, respectively. Firstly, by embedding two attention mechanisms into each other to form the point attention embedding module and fusing it with the relationship-aware sampling method proposed by PTTR (point relation transformer for tracking), the purpose of fully extracting features is achieved. Subsequently, the feature information is input into the point attention enhancement module, and through cross-attention, features from different levels are matched sequentially to achieve the goal of deep fusion of global and local features. Moreover, to obtain discriminative feature fusion maps, a residual network is employed to connect the fusion results from different layers. Finally, the feature fusion map is input into the target prediction module to achieve precise prediction of the final 3D target object. Experimental validation on KITTI, nuScenes, and Waymo datasets demonstrates the effectiveness of the proposed method. Excluding few-shot data, the proposed method achieves an average improvement of 1.4 percentage points in success and 1.4 percentage points in precision in terms of object tracking.

Keywords