Cascading Alignment for Unsupervised Domain-Adaptive DETR with Improved DeNoising Anchor Boxes

Huantong Geng; Jun Jiang; Junye Shen; Mengmeng Hou

doi:10.3390/s22249629

Sensors (Dec 2022)

Cascading Alignment for Unsupervised Domain-Adaptive DETR with Improved DeNoising Anchor Boxes

Huantong Geng,
Jun Jiang,
Junye Shen,
Mengmeng Hou

Affiliations

Huantong Geng: School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China
Jun Jiang: School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China
Junye Shen: School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China
Mengmeng Hou: School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China

DOI: https://doi.org/10.3390/s22249629
Journal volume & issue: Vol. 22, no. 24
p. 9629

Abstract

Read online

Transformer-based object detection has recently attracted increasing interest and shown promising results. As one of the DETR-like models, DETR with improved denoising anchor boxes (DINO) produced superior performance on COCO val2017 and achieved a new state of the art. However, it often encounters challenges when applied to new scenarios where no annotated data is available, and the imaging conditions differ significantly. To alleviate this problem of domain shift, in this paper, unsupervised domain adaptive DINO via cascading alignment (CA-DINO) was proposed, which consists of attention-enhanced double discriminators (AEDD) and weak-restraints on category-level token (WROT). Specifically, AEDD is used to aggregate and align the local–global context from the feature representations of both domains while reducing the domain discrepancy before entering the transformer encoder and decoder. WROT extends Deep CORAL loss to adapt class tokens after embedding, minimizing the difference in second-order statistics between the source and target domain. Our approach is trained end to end, and experiments on two challenging benchmarks demonstrate the effectiveness of our method, which yields 41% relative improvement compared to baseline on the benchmark dataset Foggy Cityscapes, in particular.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords