International Journal of Applied Earth Observations and Geoinformation (Jul 2024)

Point and voxel cross perception with lightweight cosformer for large-scale point cloud semantic segmentation

  • Shuai Zhang,
  • Biao Wang,
  • Yiping Chen,
  • Shuhang Zhang,
  • Wuming Zhang

Journal volume & issue
Vol. 131
p. 103951

Abstract

Read online

Semantic segmentation of large-scale point clouds is crucial for advancing smart city infrastructure and supporting autonomous driving technology. However, existing semantic segmentation techniques designed for indoor environments often struggle to adapt to vast outdoor scenes. Moreover, networks for large-scale scenes face challenges such as limited receptive fields and computational complexity, hindering their ability to accurately perceive small target features. To address these challenges, we propose PVCFormer, a novel cross-attention architecture that leverages both point and voxel representations. By feeding concurrently sampled data at varying voxel resolutions into the network, PVCFormer enhances the segmentation of small-scale features while expanding the receptive field. Additionally, the cross-transformer block facilitates better fusion of point and voxel features, and the introduction of CosFormer improves the computational efficiency of the network.Simultaneously, we introduce SYSU9, a new dataset labeled with 9 categories covering an area of over 7 square kilometers, to serve as a benchmark for evaluating point cloud semantic segmentation algorithms. We proposed two model versions, PVCFormer-CA and PVCFormer-SA. PVCFormer-CA achieves an overall accuracy of 92.4 % on SensatUrban, 94.6 % on DALES, and 91.1 % on SYSU9. For semantic segmentation, PVCFormer-CA achieves 61.5 % mIoU on SensatUrban, 73.6 % mIoU on DALES, and 62.4 % mIoU on SYSU9.Our experiments demonstrated promising results in large-scale outdoor point cloud semantic segmentation and introduce novel methodologies leveraging attention mechanisms for handling large-scale point clouds.

Keywords