IEEE Access (Jan 2022)

MStrans: Multiscale Vision Transformer for Aerial Objects Detection

  • Guanlin Lu,
  • Xiaohui He,
  • Qiang Wang,
  • Faming Shao,
  • Jinkang Wang,
  • Likai Hao

DOI
https://doi.org/10.1109/ACCESS.2022.3190415
Journal volume & issue
Vol. 10
pp. 75971 – 75985

Abstract

Read online

Detecting objects in aerial images is a challenging task due to the large-scale variations and arbitrary orientations with tiny instances. A new multi-scale transformer-based aerial objects detector called MStrans is proposed in this paper to deal with the challenges in aerial detection. To detect remote instances, MStrans adopts a multi-scale patch embedding transformer (MViT) to extract the global features of the image effectively. Furthermore, to capture the different discriminant features for classification and regression branch tasks, the partial interactive fusion module (PIFM) is designed to enhance the semantic expression of the key features of classification and regression tasks by using the strategy of interactive modeling of adjacent layer features. In addition, considering that the transformer may worsen the local feature details while capturing long-distance feature dependencies, this paper designs a global to local interactive fusion module (GLIFM). It uses the advantage of convolution to extract local features to enrich the detailed information in the transformer. Experiments were carried out on DOTA and DIOR datasets, and the MStrans achieves superior detection performances compared with other approaches.

Keywords