International Journal of Applied Earth Observations and Geoinformation (Sep 2024)
Point cloud semantic segmentation with adaptive spatial structure graph transformer
Abstract
With the rapid development of LiDAR and artificial intelligence technologies, 3D point cloud semantic segmentation has become a highlight research topic. This technology is able to significantly enhance the capabilities of building information modeling, navigation and environmental perception. However, current deep learning-based methods primarily rely on voxelization or multi-layer convolution for feature extraction. These methods often face challenges in effectively differentiating between homogeneous objects or structurally adherent targets in complex real-world scenes. To this end, we propose a Graph Transformer point cloud semantic segmentation network (ASGFormer) tailored for structurally adherent objects. Firstly, ASGFormer combines Graph and Transformer to promote global correlation understanding in the graph. Secondly, spatial index and position embedding are constructed based on distance relationships and feature differences. Through a learnable mechanism, the structural weights between points are dynamically adjusted, achieving adaptive spatial structure within the graph. Finally, dummy nodes are introduced to facilitate global information storage and transmission between layers, effectively addressing the issue of information loss at the terminal nodes of the graph. Comprehensive experiments are conducted on the various real-world 3D point cloud datasets, analyzing the effectiveness of proposed ASGFormer through qualitative and quantitative evaluations. ASGFormer outperforms existing approaches with of 91.3% for OA, 78.0% for mAcc, and 72.3% for mIoU on S3DIS dataset. Moreover, ASGFormer achieves 72.8%, 45.5%, 81.6%, 70.1% mIoU on ScanNet, City-Facade, Toronto 3D and Semantic KITTI dataset, respectively. Notably, the proposed method demonstrates effective differentiation of homogeneous structurally adherent objects, further contributing to the intelligent perception and modeling of complex scenes.