IEEE Access (Jan 2024)
C-YOSO: Contrastive Query on Real-Time Panoptic Segmentation
Abstract
Panoptic segmentation, combining instance and semantic segmentation, provides comprehensive image understanding for various tasks. Achieving real-time performance with high accuracy is challenging. Recent panoptic segmentation models operate in real-time but frequently deal with low accuracy in comparison to existing benchmarks. In this paper, we aim to enhance the performance of the model “You Only Segment Once” (YOSO), the fastest panoptic segmentation model. Our model, C-YOSO, enhances YOSO by incorporating a contrastive query decoder module with two core components. First, the textual-guided query utilizes a contrastive loss between object queries and textual ground truth to boost accuracy. Second, the lightweight query decoder accelerates inference speed by leveraging global average pooling (GAP) and $1\times 1$ convolutions. The experiment is conducted on the Cityscapes dataset, comparing C-YOSO (ours) and YOSO. Results signify improved accuracy from 59.7 to 61.8 panoptic quality (PQ) while maintaining similar inference speed from 11.1 to 11.0 frames per second (FPS). Moreover, accuracy is seen to increase in almost all classes. To make it a real-time system, we reduce the input size by half, achieving 22.3 FPS with 54.1 PQ. These results demonstrate that our model achieves the best performance in both accuracy (PQ) and speed (FPS).
Keywords