Enhancing feature fusion with spatial aggregation and channel fusion for semantic segmentation

Jie Hu; Huifang Kong; Lei Fan; Jun Zhou

doi:10.1049/cvi2.12026

IET Computer Vision (Sep 2021)

Enhancing feature fusion with spatial aggregation and channel fusion for semantic segmentation

Jie Hu,
Huifang Kong,
Lei Fan,
Jun Zhou

Affiliations

Jie Hu: School of Electrical Engineering and Automation Hefei University of Technology Hefei China
Huifang Kong: School of Electrical Engineering and Automation Hefei University of Technology Hefei China
Lei Fan: School of Electrical Engineering and Automation Hefei University of Technology Hefei China
Jun Zhou: School of Electrical Engineering and Automation Hefei University of Technology Hefei China

DOI: https://doi.org/10.1049/cvi2.12026
Journal volume & issue: Vol. 15, no. 6
pp. 418 – 427

Abstract

Read online

Abstract Semantic segmentation is crucial to the autonomous driving, as an accurate recognition and location of the surrounding scenes can be provided for the street scenes understanding task. Many existing segmentation networks usually fuse high‐level and low‐level features to boost segmentation performance. However, the simple fusion may impose a limited performance improvement because of the gap between high‐level and low‐level features. To alleviate this limitation, we respectively propose spatial aggregation and channel fusion to bridge the gap. Our implementation, inspired by the attention mechanism, consists of two steps: (1) Spatial aggregation relies on the proposed pyramid spatial context aggregation module to capture spatial similarities to enhance the spatial representation of high‐level features, which is more effective for the latter fusion. (2) Channel fusion relies on the proposed attention‐based channel fusion module to weight channel maps on different levels to enhance the fusion. In addition, the complete network with U‐shape structure is constructed. A series of ablation experiments are conducted to demonstrate the effectiveness of our designs, and the network achieves mIoU score of 81.4% on Cityscapes test dataset and 84.6% on PASCALVOC 2012 test dataset.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords