IEEE Access (Jan 2024)
Constructing 3D Object Detectors Based on Deformable Convolutional Guided Depths
Abstract
This paper introduces a depth-guided 3D object detection method that enhances the feature extraction capability of the backbone network through weak supervision. It combines large kernel convolution, global response normalization, and layer normalization techniques to significantly improve feature robustness under weakly supervised conditions. Additionally, the depth estimation module’s feature extraction ability is bolstered by optimizing the depth-guided encoder and incorporating large-kernel depthwise separable convolutions alongside a spatial attention mechanism. On the decoder side, deformable convolutions are employed to modulate deep feature maps, reducing inference and training time while minimizing model complexity. This approach avoids the complexity associated with transformer architectures. Experiments on the KITTI 3D dataset demonstrate that the method diminishes reliance on manual labeling and can notably enhance detection accuracy while simultaneously improving processing speed.
Keywords