Semantic segmentation via pixel‐to‐center similarity calculation

Dongyue Wu; Zilin Guo; Aoyan Li; Changqian Yu; Nong Sang; Changxin Gao

doi:10.1049/cit2.12245

CAAI Transactions on Intelligence Technology (Feb 2024)

Semantic segmentation via pixel‐to‐center similarity calculation

Dongyue Wu,
Zilin Guo,
Aoyan Li,
Changqian Yu,
Nong Sang,
Changxin Gao

Affiliations

Dongyue Wu: National Key Laboratory of Science and Technology on Multispectral Information Processing School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China
Zilin Guo: National Key Laboratory of Science and Technology on Multispectral Information Processing School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China
Aoyan Li: National Key Laboratory of Science and Technology on Multispectral Information Processing School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China
Changqian Yu: Meituan Inc. Beijing China
Nong Sang: National Key Laboratory of Science and Technology on Multispectral Information Processing School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China
Changxin Gao: National Key Laboratory of Science and Technology on Multispectral Information Processing School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China

DOI: https://doi.org/10.1049/cit2.12245
Journal volume & issue: Vol. 9, no. 1
pp. 87 – 100

Abstract

Read online

Abstract Since the fully convolutional network has achieved great success in semantic segmentation, lots of works have been proposed to extract discriminative pixel representations. However, the authors observe that existing methods still suffer from two typical challenges: (i) The intra‐class feature variation between different scenes may be large, leading to the difficulty in maintaining the consistency between same‐class pixels from different scenes; (ii) The inter‐class feature distinction in the same scene could be small, resulting in the limited performance to distinguish different classes in each scene. The authors first rethink semantic segmentation from a perspective of similarity between pixels and class centers. Each weight vector of the segmentation head represents its corresponding semantic class in the whole dataset, which can be regarded as the embedding of the class center. Thus, the pixel‐wise classification amounts to computing similarity in the final feature space between pixels and the class centers. Under this novel view, the authors propose a Class Center Similarity (CCS) layer to address the above‐mentioned challenges by generating adaptive class centers conditioned on each scenes and supervising the similarities between class centers. The CCS layer utilises the Adaptive Class Center Module to generate class centers conditioned on each scene, which adapt the large intra‐class variation between different scenes. Specially designed Class Distance Loss (CD Loss) is introduced to control both inter‐class and intra‐class distances based on the predicted center‐to‐center and pixel‐to‐center similarity. Finally, the CCS layer outputs the processed pixel‐to‐center similarity as the segmentation prediction. Extensive experiments demonstrate that our model performs favourably against the state‐of‐the‐art methods.

Published in CAAI Transactions on Intelligence Technology

ISSN: 2468-2322 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/24682322

About the journal

Abstract

Keywords