Point cloud semantic segmentation based on local feature fusion and multilayer attention network

Junjie Wen; Jie Ma; Yuehua Zhao; Tong Nie; Mengxuan Sun; Ziming Fan

doi:10.1049/cvi2.12255

IET Computer Vision (Apr 2024)

Point cloud semantic segmentation based on local feature fusion and multilayer attention network

Junjie Wen,
Jie Ma,
Yuehua Zhao,
Tong Nie,
Mengxuan Sun,
Ziming Fan

Affiliations

Junjie Wen: School of Electronics and Information Engineering Hebei University of Technology Tianjin China
Jie Ma: School of Electronics and Information Engineering Hebei University of Technology Tianjin China
Yuehua Zhao: School of Electronics and Information Engineering Hebei University of Technology Tianjin China
Tong Nie: School of Electronics and Information Engineering Hebei University of Technology Tianjin China
Mengxuan Sun: School of Electronics and Information Engineering Hebei University of Technology Tianjin China
Ziming Fan: School of Electronics and Information Engineering Hebei University of Technology Tianjin China

DOI: https://doi.org/10.1049/cvi2.12255
Journal volume & issue: Vol. 18, no. 3
pp. 381 – 392

Abstract

Read online

Abstract Semantic segmentation from a three‐dimensional point cloud is vital in autonomous driving, computer vision, and augmented reality. However, current semantic segmentation does not effectively use the point cloud's local geometric features and contextual information, essential for improving segmentation accuracy. A semantic segmentation network that uses local feature fusion and a multilayer attention mechanism is proposed to address these challenges. Specifically, the authors designed a local feature fusion module to encode the geometric and feature information separately, which fully leverages the point cloud's feature perception and geometric structure representation. Furthermore, the authors designed a multilayer attention pooling module consisting of local attention pooling and cascade attention pooling to extract contextual information. Local attention pooling is used to learn local neighbourhood information, and cascade attention pooling captures contextual information from deeper local neighbourhoods. Finally, an enhanced feature representation of important information is obtained by aggregating the features from the two deep attention pooling methods. Extensive experiments on large‐scale point‐cloud datasets Stanford 3D large‐scale indoor spaces and SemanticKITTI indicate that authors network shows excellent advantages over existing representative methods regarding local geometric feature description and global contextual relationships.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords