IEEE Access (Jan 2025)
Transformer-Based Sensor Fusion for Autonomous Vehicles: A Comprehensive Review
Abstract
Sensor fusion is vital for many critical applications, including robotics, autonomous driving, aerospace, and beyond. Integrating data streams from different sensors enables us to overcome the intrinsic limitations of each sensor, providing more reliable measurements and reducing uncertainty. Moreover, deep learning-based sensor fusion unlocked the possibility of multimodal learning, which utilizes different sensor modalities to boost object detection. Yet, adverse weather conditions remain a significant challenge to the reliability of sensor fusion. However, introducing the Transformer deep learning model in sensor fusion presents a promising avenue for advancing its sensing capabilities, potentially overcoming that challenge. Transformer models proved powerful in modeling vision, language, and numerous other domains. However, these models suffer from high latency and heavy computation requirements. This paper aims to provide: 1) an extensive overview of sensor fusion and transformer models; 2) an in-depth survey of the state-of-the-art (SoTA) methods for Transformer-based sensor fusion, focusing on camera-LiDAR and camera-radar methods; and 3) a quantitative analysis of the SoTA methods, uncovering research gaps and stimulating future work.
Keywords