Enhanced multi-scale networks for semantic segmentation

Tianping Li; Zhaotong Cui; Yu Han; Guanxing Li; Meng Li; Dongmei Wei

doi:10.1007/s40747-023-01279-x

Complex & Intelligent Systems (Dec 2023)

Enhanced multi-scale networks for semantic segmentation

Tianping Li,
Zhaotong Cui,
Yu Han,
Guanxing Li,
Meng Li,
Dongmei Wei

Affiliations

Tianping Li: School of Physics and Electronics, Shandong Normal University
Zhaotong Cui: School of Physics and Electronics, Shandong Normal University
Yu Han: School of Physics and Electronics, Shandong Normal University
Guanxing Li: School of Physics and Electronics, Shandong Normal University
Meng Li: School of Physics and Electronics, Shandong Normal University
Dongmei Wei: School of Physics and Electronics, Shandong Normal University

DOI: https://doi.org/10.1007/s40747-023-01279-x
Journal volume & issue: Vol. 10, no. 2
pp. 2557 – 2568

Abstract

Read online

Abstract Multi-scale representation provides an effective answer to the scale variation of objects and entities in semantic segmentation. The ability to capture long-range pixel dependency facilitates semantic segmentation. In addition, semantic segmentation necessitates the effective use of pixel-to-pixel similarity in the channel direction to enhance pixel areas. By reviewing the characteristics of earlier successful segmentation models, we discover a number of crucial elements that enhance segmentation model performance, including a robust encoder structure, multi-scale interactions, attention mechanisms, and a robust decoder structure. The attention mechanism of the asymmetric non-local neural network (ANNet) is merged with multi-scale pyramidal modules to accelerate model segmentation while maintaining high accuracy. However, ANNet does not account for the similarity between pixels in the feature map channel direction, making the segmentation accuracy unsatisfactory. As a result, we propose EMSNet, a straightforward convolutional network architecture for semantic segmentation that consists of Integration of enhanced regional module (IERM) and Multi-scale convolution module (MSCM). The IERM module generates weights using four or five-stage feature maps, then fuses the input features with the weights and uses more computation. The similarity of the channel direction feature graphs is also calculated using ANNet’s auxiliary loss function. The MSCM module can more accurately describe the interactions between various channels, capture the interdependencies between feature pixels, and capture the multi-scale context. Experiments prove that we perform well in tests using the benchmark dataset. On Cityscapes test data, we get 82.2% segmentation accuracy. The mIoU in the ADE20k and Pascal VOC datasets are, respectively, 45.58% and 85.46%.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords