CAAI Transactions on Intelligence Technology (Feb 2024)
Semantic segmentation via pixel‐to‐center similarity calculation
Abstract
Abstract Since the fully convolutional network has achieved great success in semantic segmentation, lots of works have been proposed to extract discriminative pixel representations. However, the authors observe that existing methods still suffer from two typical challenges: (i) The intra‐class feature variation between different scenes may be large, leading to the difficulty in maintaining the consistency between same‐class pixels from different scenes; (ii) The inter‐class feature distinction in the same scene could be small, resulting in the limited performance to distinguish different classes in each scene. The authors first rethink semantic segmentation from a perspective of similarity between pixels and class centers. Each weight vector of the segmentation head represents its corresponding semantic class in the whole dataset, which can be regarded as the embedding of the class center. Thus, the pixel‐wise classification amounts to computing similarity in the final feature space between pixels and the class centers. Under this novel view, the authors propose a Class Center Similarity (CCS) layer to address the above‐mentioned challenges by generating adaptive class centers conditioned on each scenes and supervising the similarities between class centers. The CCS layer utilises the Adaptive Class Center Module to generate class centers conditioned on each scene, which adapt the large intra‐class variation between different scenes. Specially designed Class Distance Loss (CD Loss) is introduced to control both inter‐class and intra‐class distances based on the predicted center‐to‐center and pixel‐to‐center similarity. Finally, the CCS layer outputs the processed pixel‐to‐center similarity as the segmentation prediction. Extensive experiments demonstrate that our model performs favourably against the state‐of‐the‐art methods.
Keywords