IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
DE-Net: A Dual-Encoder Network for Local and Long-Distance Context Information Extraction in Semantic Segmentation of Large-Scale Scene Point Clouds
Abstract
Semantic segmentation of large-scale point clouds is essential for applications such as autonomous driving and high-definition mapping. However, this task remains challenging due to the imbalanced distribution of categories in large-scale point cloud data and the similarity in local geometric structures. Most current deep learning–based methods concentrate on designing local feature extraction modules while neglecting the significance of long-distance contextual information. Nevertheless, this contextual information is crucial for accurate object segmentation in large-scale scenes. To address this limitation, we propose a dual-encoder segmentation network called DE-Net. DE-Net effectively learns both the local and long-distance contextual information for each point to achieve accurate point segmentation. DE-Net consists of two main components: dual-encoder modules (DEMs) and gradient-aware pooling modules (GAPM). DEMs extract local geometry and long-distance contextual information for each point using positional and trigonometric encoding to distinguish complex geometric features. GAPMs aggregate global information effectively using dual-distance and xy gradient information. In addition, a prediction jitter module was introduced during training to address the issue of class imbalance and improve the network's prediction results. The experimental results on three public benchmarks demonstrate that DE-Net outperforms existing state-of-the-art methods, achieving mean intersection over union scores of 83.5%, 61.8%, and 63.9% on Toronto-3D, WHU-MLS, and S3DIS datasets, respectively.
Keywords