Sensors (Aug 2024)
BMSeNet: Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network for Real-Time Semantic Segmentation
Abstract
Most real-time semantic segmentation networks use shallow architectures to achieve fast inference speeds. This approach, however, limits a network’s receptive field. Concurrently, feature information extraction is restricted to a single scale, which reduces the network’s ability to generalize and maintain robustness. Furthermore, loss of image spatial details negatively impacts segmentation accuracy. To address these limitations, this paper proposes a Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network (BMSeNet). First, to address the limitation of singular semantic feature scales, a Multiscale Context Pyramid Pooling Module (MSCPPM) is introduced. By leveraging various pooling operations, this module efficiently enlarges the receptive field and better aggregates multiscale contextual information. Moreover, a Spatial Detail Enhancement Module (SDEM) is designed, to effectively compensate for lost spatial detail information and significantly enhance the perception of spatial details. Finally, a Bilateral Attention Fusion Module (BAFM) is proposed. This module leverages pixel positional correlations to guide the network in assigning appropriate weights to the features extracted from the two branches, effectively merging the feature information of both branches. Extensive experiments were conducted on the Cityscapes and CamVid datasets. Experimental results show that the proposed BMSeNet achieves a good balance between inference speed and segmentation accuracy, outperforming some state-of-the-art real-time semantic segmentation methods.
Keywords