Mathematics (Sep 2024)

TSPconv-Net: Transformer and Sparse Convolution for 3D Instance Segmentation in Point Clouds

  • Xiaojuan Ning,
  • Yule Liu,
  • Yishu Ma,
  • Zhiwei Lu,
  • Haiyan Jin,
  • Zhenghao Shi,
  • Yinghui Wang 

DOI
https://doi.org/10.3390/math12182926
Journal volume & issue
Vol. 12, no. 18
p. 2926

Abstract

Read online

Current deep learning approaches for indoor 3D instance segmentation often rely on multilayer perceptrons (MLPs) for feature extraction. However, MLPs struggle to effectively capture the complex spatial relationships inherent in 3D scene data. To address this issue, we propose a novel and efficient framework for 3D instance segmentation called TSPconv-Net. In contrast to existing methods that primarily depend on MLPs for feature extraction, our framework integrates a more robust feature extraction model comprising the offset-attention (OA) mechanism and submanifold sparse convolution (SSC). The proposed framework is an end-to-end network architecture. TSPconv-Net consists of a backbone network followed by a bounding box module. Specifically, the backbone network utilizes the OA mechanism to extract global features and employs SSC for local feature extraction. The bounding box module then conducts instance segmentation based on the extracted features. Experimental results demonstrate that our approach outperforms existing work on the S3DIS dataset while maintaining computational efficiency. TSPconv-Net achieves 68.6% mPrec, 52.5% mRec, and 60.1% mAP on the test set, surpassing 3D-BoNet by 3.0% mPrec, 5.4% mRec, and 2.6% mAP. Furthermore, it demonstrates high efficiency, completing computations in just 326 s.

Keywords