Remote Sensing (Mar 2022)

CRTransSar: A Visual Transformer Based on Contextual Joint Representation Learning for SAR Ship Detection

  • Runfan Xia,
  • Jie Chen,
  • Zhixiang Huang,
  • Huiyao Wan,
  • Bocai Wu,
  • Long Sun,
  • Baidong Yao,
  • Haibing Xiang,
  • Mengdao Xing

DOI
https://doi.org/10.3390/rs14061488
Journal volume & issue
Vol. 14, no. 6
p. 1488

Abstract

Read online

Synthetic-aperture radar (SAR) image target detection is widely used in military, civilian and other fields. However, existing detection methods have low accuracy due to the limitations presented by the strong scattering of SAR image targets, unclear edge contour information, multiple scales, strong sparseness, background interference, and other characteristics. In response, for SAR target detection tasks, this paper combines the global contextual information perception of transformers and the local feature representation capabilities of convolutional neural networks (CNNs) to innovatively propose a visual transformer framework based on contextual joint-representation learning, referred to as CRTransSar. First, this paper introduces the latest Swin Transformer as the basic architecture. Next, it introduces the CNN’s local information capture and presents the design of a backbone, called CRbackbone, based on contextual joint representation learning, to extract richer contextual feature information while strengthening SAR target feature attributes. Furthermore, the design of a new cross-resolution attention-enhancement neck, called CAENeck, is presented to enhance the characterizability of multiscale SAR targets. The mAP of our method on the SSDD dataset attains 97.0% accuracy, reaching state-of-the-art levels. In addition, based on the HISEA-1 commercial SAR satellite, which has been launched into orbit and in whose development our research group participated, we released a larger-scale SAR multiclass target detection dataset, called SMCDD, which verifies the effectiveness of our method.

Keywords