Remote Sensing (Sep 2022)

SVASeg: Sparse Voxel-Based Attention for 3D LiDAR Point Cloud Semantic Segmentation

  • Lin Zhao,
  • Siyuan Xu,
  • Liman Liu,
  • Delie Ming,
  • Wenbing Tao

DOI
https://doi.org/10.3390/rs14184471
Journal volume & issue
Vol. 14, no. 18
p. 4471

Abstract

Read online

3D LiDAR has become an indispensable sensor in autonomous driving vehicles. In LiDAR-based 3D point cloud semantic segmentation, most voxel-based 3D segmentors cannot efficiently capture large amounts of context information, resulting in limited receptive fields and limiting their performance. To address this problem, a sparse voxel-based attention network is introduced for 3D LiDAR point cloud semantic segmentation, termed SVASeg, which captures large amounts of context information between voxels through sparse voxel-based multi-head attention (SMHA). The traditional multi-head attention cannot directly be applied to the non-empty sparse voxels. To this end, a hash table is built according to the incrementation of voxel coordinates to lookup the non-empty neighboring voxels of each sparse voxel. Then, the sparse voxels are grouped into different groups, and each group corresponds to a local region. Afterwards, position embedding, multi-head attention and feature fusion are performed for each group to capture and aggregate the context information. Based on the SMHA module, the SVASeg can directly operate on the non-empty voxels, maintaining a comparable computational overhead to the convolutional method. Extensive experimental results on the SemanticKITTI and nuScenes datasets show the superiority of SVASeg.

Keywords