Shanghai Jiaotong Daxue xuebao (Nov 2024)
Vehicle-Road Collaborative Perception Method Based on Dual-Stream Feature Extraction
Abstract
To solve the problem of inadequate perception of autonomous driving in occlusion and over-the-horizon scenarios, a vehicle-road collaborative perception method based on a dual-stream feature extraction network is proposed to enhance the 3D object detection capabilities of traffic participants. Feature extraction networks for roadside and vehicle-side scenes are tailored based on respective characteristics. Since roadside has rich and sufficient sensing data and computational resources, the Transformer structure is used to extract more sophisticated and advanced feature representations. Due to limited computational capability and high real-time demands of autonomous vehicles, partial convolution (PConv) is employed to enhance computing efficiency, and the Mamba-VSS module is introduced for efficient perception in complex environments. Collaborative perception between vehicle-side and roadside is accomplished through the selective sharing and fusion of critical perceptual information guided by confidence maps. By training and testing on DAIR-V2X dataset, the model size of vehicle-side feature extraction network is obtained to be 8.1 MB, and the IoU thresholds of 0.5 and 0.7 correspond to the average accuracy indexes of 67.67% and 53.74%. The experiment verifies the advantages of this method in detection accuracy and model size, and provides a lower-configuration detection scheme for vehicle-road collaboration.
Keywords