IEEE Access (Jan 2023)

Recurrent DETR: Transformer-Based Object Detection for Crowded Scenes

  • Hyeong Kyu Choi,
  • Chong Keun Paik,
  • Hyun Woo Ko,
  • Min-Chul Park,
  • Hyunwoo J. Kim

DOI
https://doi.org/10.1109/ACCESS.2023.3293532
Journal volume & issue
Vol. 11
pp. 78623 – 78643

Abstract

Read online

Recent Transformer-based object detectors have achieved remarkable performance on benchmark datasets, but few have addressed the real-world challenge of object detection in crowded scenes using transformers. This limitation stems from the fixed query set size of the transformer decoder, which restricts the model’s inference capacity. To overcome this challenge, we propose Recurrent Detection Transformer (Recurrent DETR), an object detector that iterates the decoder block to render more predictions with a finite number of query tokens. Recurrent DETR can adaptively control the number of decoder block iterations based on the image’s crowdedness or complexity, resulting in a variable-size prediction set. This is enabled by our novel Pondering Hungarian Loss, which helps the model to learn when additional computation is required to identify all the objects in a crowded scene. We demonstrate the effectiveness of Recurrent DETR on two datasets: COCO 2017, which represents a standard setting, and CrowdHuman, which features a crowded setting. Our experiments on both datasets show that Recurrent DETR achieves significant performance gains of 0.8 AP and 0.4 AP, respectively, over its base architectures. Moreover, we conduct comprehensive analyses under different query set size constraints to provide a thorough evaluation of our proposed method.

Keywords