Multi‐Scale Feature Attention‐DEtection TRansformer: Multi‐Scale Feature Attention for security check object detection

Haifeng Sima; Bailiang Chen; Chaosheng Tang; Yudong Zhang; Junding Sun

doi:10.1049/cvi2.12267

IET Computer Vision (Aug 2024)

Multi‐Scale Feature Attention‐DEtection TRansformer: Multi‐Scale Feature Attention for security check object detection

Haifeng Sima,
Bailiang Chen,
Chaosheng Tang,
Yudong Zhang,
Junding Sun

Affiliations

Haifeng Sima: School of Computer Science and Technology Henan Polytechnic University Jiaozuo China
Bailiang Chen: School of Computer Science and Technology Henan Polytechnic University Jiaozuo China
Chaosheng Tang: School of Computer Science and Technology Henan Polytechnic University Jiaozuo China
Yudong Zhang: School of Computer Science and Technology Henan Polytechnic University Jiaozuo China
Junding Sun: School of Computer Science and Technology Henan Polytechnic University Jiaozuo China

DOI: https://doi.org/10.1049/cvi2.12267
Journal volume & issue: Vol. 18, no. 5
pp. 613 – 625

Abstract

Read online

Abstract X‐ray security checks aim to detect contraband in luggage; however, the detection accuracy is hindered by the overlapping and significant size differences of objects in X‐ray images. To address these challenges, the authors introduce a novel network model named Multi‐Scale Feature Attention (MSFA)‐DEtection TRansformer (DETR). Firstly, the pyramid feature extraction structure is embedded into the self‐attention module, referred to as the MSFA. Leveraging the MSFA module, MSFA‐DETR extracts multi‐scale feature information and amalgamates them into high‐level semantic features. Subsequently, these features are synergised through attention mechanisms to capture correlations between global information and multi‐scale features. MSFA significantly bolsters the model's robustness across different sizes, thereby enhancing detection accuracy. Simultaneously, A new initialisation method for object queries is proposed. The authors’ foreground sequence extraction (FSE) module extracts key feature sequences from feature maps, serving as prior knowledge for object queries. FSE expedites the convergence of the DETR model and elevates detection accuracy. Extensive experimentation validates that this proposed model surpasses state‐of‐the‐art methods on the CLCXray and PIDray datasets.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords