MVPNet: A multi-scale voxel-point adaptive fusion network for point cloud semantic segmentation in urban scenes

Huchen Li; Haiyan Guan; Lingfei Ma; Xiangda Lei; Yongtao Yu; Hanyun Wang; Mahmoud Reza Delavar; Jonathan Li

International Journal of Applied Earth Observations and Geoinformation (Aug 2023)

MVPNet: A multi-scale voxel-point adaptive fusion network for point cloud semantic segmentation in urban scenes

Huchen Li,
Haiyan Guan,
Lingfei Ma,
Xiangda Lei,
Yongtao Yu,
Hanyun Wang,
Mahmoud Reza Delavar,
Jonathan Li

Affiliations

Huchen Li: School of Remote Sensing and Geomatics Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
Haiyan Guan: School of Remote Sensing and Geomatics Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China; Corresponding authors.
Lingfei Ma: School of Statistics and Mathematics, Central University of Finance and Economics, Beijing 102206, China; Corresponding authors.
Xiangda Lei: School of Remote Sensing and Geomatics Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
Yongtao Yu: Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian 223003, China
Hanyun Wang: School of Surveying and Mapping, Information Engineering University, Zhengzhou 450000, China
Mahmoud Reza Delavar: College of Engineering, University of Tehran, Tehran 1439951154, Iran
Jonathan Li: Department of Geography and Environmental Management, University of Waterloo, Waterloo, ON N2L 3G1, Canada; Department of Systems Design Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada

Journal volume & issue: Vol. 122
p. 103391

Abstract

Read online

Point cloud semantic segmentation, which contributes to scene understanding at different scales, is crucial for three-dimensional reconstruction and digital twin cities. However, current semantic segmentation methods mostly extract multi-scale features by down-sampling operations, but the feature maps only have a single receptive field at the same scale, resulting in the misclassification of objects with spatial similarity. To effectively capture the geometric features and the semantic information of different receptive fields, a multi-scale voxel-point adaptive fusion network (MVP-Net) is proposed for point cloud semantic segmentation in urban scenes. First, a multi-scale voxel fusion module with gating mechanism is designed to explore the semantic representation ability of different receptive fields. Then, a geometric self-attention module is constructed to deeply fuse fine-grained point features with coarse-grained voxel features. Finally, a pyramid decoder is introduced to aggregate context information at different scales for enhancing feature representation. The proposed MVP-Net was evaluated on three datasets, Toronto3D, WHU-MLS, and SensatUrban, and achieved superior performance in comparison to the state-of-the-art (SOTA) methods. For the public Toronto3D and SensatUrban datasets, our MVP-Net achieved a mIoU of 84.14% and 59.40%, and an overall accuracy of 98.12% and 93.30%, respectively.

Published in International Journal of Applied Earth Observations and Geoinformation

ISSN: 1569-8432 (Print); 1872-826X (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Geography. Anthropology. Recreation: Physical geography; Geography. Anthropology. Recreation: Environmental sciences
Website: https://www.journals.elsevier.com/international-journal-of-applied-earth-observation-and-geoinformation

About the journal

Abstract

Keywords