Enhancing Query Formulation for Universal Image Segmentation

Yipeng Qu; Joohee Kim

doi:10.3390/s24061879

Sensors (Mar 2024)

Enhancing Query Formulation for Universal Image Segmentation

Yipeng Qu,
Joohee Kim

Affiliations

Yipeng Qu: Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA
Joohee Kim: Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA

DOI: https://doi.org/10.3390/s24061879
Journal volume & issue: Vol. 24, no. 6
p. 1879

Abstract

Read online

Recent advancements in image segmentation have been notably driven by Vision Transformers. These transformer-based models offer one versatile network structure capable of handling a variety of segmentation tasks. Despite their effectiveness, the pursuit of enhanced capabilities often leads to more intricate architectures and greater computational demands. OneFormer has responded to these challenges by introducing a query-text contrastive learning strategy active during training only. However, this approach has not completely addressed the inefficiency issues in text generation and the contrastive loss computation. To solve these problems, we introduce Efficient Query Optimizer (EQO), an approach that efficiently utilizes multi-modal data to refine query optimization in image segmentation. Our strategy significantly reduces the complexity of parameters and computations by distilling inter-class and inter-task information from an image into a single template sentence. Furthermore, we propose a novel attention-based contrastive loss. It is designed to facilitate a one-to-many matching mechanism in the loss computation, which helps object queries learn more robust representations. Beyond merely reducing complexity, our model demonstrates superior performance compared to OneFormer across all three segmentation tasks using the Swin-T backbone. Our evaluations on the ADE20K dataset reveal that our model outperforms OneFormer in multiple metrics: by 0.2% in mean Intersection over Union (mIoU), 0.6% in Average Precision (AP), and 0.8% in Panoptic Quality (PQ). These results highlight the efficacy of our model in advancing the field of image segmentation.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords