A Novel Multi-Scale Transformer for Object Detection in Aerial Scenes

Guanlin Lu; Xiaohui He; Qiang Wang; Faming Shao; Hongwei Wang; Jinkang Wang

doi:10.3390/drones6080188

Drones (Jul 2022)

A Novel Multi-Scale Transformer for Object Detection in Aerial Scenes

Guanlin Lu,
Xiaohui He,
Qiang Wang,
Faming Shao,
Hongwei Wang,
Jinkang Wang

Affiliations

Guanlin Lu: Department of Mechanical Engineering, College of Field Engineering, Army Engineering University, PLA, Nanjing 210007, China
Xiaohui He: Department of Mechanical Engineering, College of Field Engineering, Army Engineering University, PLA, Nanjing 210007, China
Qiang Wang: Department of Mechanical Engineering, College of Field Engineering, Army Engineering University, PLA, Nanjing 210007, China
Faming Shao: Department of Mechanical Engineering, College of Field Engineering, Army Engineering University, PLA, Nanjing 210007, China
Hongwei Wang: Department of Mechanical Engineering, College of Field Engineering, Army Engineering University, PLA, Nanjing 210007, China
Jinkang Wang: Department of Mechanical Engineering, College of Field Engineering, Army Engineering University, PLA, Nanjing 210007, China

DOI: https://doi.org/10.3390/drones6080188
Journal volume & issue: Vol. 6, no. 8
p. 188

Abstract

Read online

Deep learning has promoted the research of object detection in aerial scenes. However, most of the existing networks are limited by the large-scale variation of objects and the confusion of category features. To overcome these limitations, this paper proposes a novel aerial object detection framework called DFCformer. DFCformer is mainly composed of three parts: the backbone network DMViT, which introduces deformation patch embedding and multi-scale adaptive self-attention to capture sufficient features of the objects; FRGC guides feature interaction layer by layer to break the barriers between feature layers and improve the information discrimination and processing ability of multi-scale critical features; CAIM adopts an attention mechanism to fuse multi-scale features to perform hierarchical reasoning on the relationship between different levels and fully utilize the complementary information in multi-scale features. Extensive experiments have been conducted on the FAIR1M dataset, and DFCformer shows its advantages by achieving the highest scores with stronger scene adaptability.

Published in Drones

ISSN: 2504-446X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Motor vehicles. Aeronautics. Astronautics
Website: http://www.mdpi.com/journal/drones

About the journal

Abstract

Keywords