CRTransSar: A Visual Transformer Based on Contextual Joint Representation Learning for SAR Ship Detection

Runfan Xia; Jie Chen; Zhixiang Huang; Huiyao Wan; Bocai Wu; Long Sun; Baidong Yao; Haibing Xiang; Mengdao Xing

doi:10.3390/rs14061488

Remote Sensing (Mar 2022)

CRTransSar: A Visual Transformer Based on Contextual Joint Representation Learning for SAR Ship Detection

Runfan Xia,
Jie Chen,
Zhixiang Huang,
Huiyao Wan,
Bocai Wu,
Long Sun,
Baidong Yao,
Haibing Xiang,
Mengdao Xing

Affiliations

Runfan Xia: Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei 230601, China
Jie Chen: Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei 230601, China
Zhixiang Huang: Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei 230601, China
Huiyao Wan: Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei 230601, China
Bocai Wu: 38th Research Institute of China Electronics Technology Group Corporation, Hefei 230601, China
Long Sun: 38th Research Institute of China Electronics Technology Group Corporation, Hefei 230601, China
Baidong Yao: 38th Research Institute of China Electronics Technology Group Corporation, Hefei 230601, China
Haibing Xiang: 38th Research Institute of China Electronics Technology Group Corporation, Hefei 230601, China
Mengdao Xing: National Lab of Radar Signal Processing, Xidian University, Xi’an 710126, China

DOI: https://doi.org/10.3390/rs14061488
Journal volume & issue: Vol. 14, no. 6
p. 1488

Abstract

Read online

Synthetic-aperture radar (SAR) image target detection is widely used in military, civilian and other fields. However, existing detection methods have low accuracy due to the limitations presented by the strong scattering of SAR image targets, unclear edge contour information, multiple scales, strong sparseness, background interference, and other characteristics. In response, for SAR target detection tasks, this paper combines the global contextual information perception of transformers and the local feature representation capabilities of convolutional neural networks (CNNs) to innovatively propose a visual transformer framework based on contextual joint-representation learning, referred to as CRTransSar. First, this paper introduces the latest Swin Transformer as the basic architecture. Next, it introduces the CNN’s local information capture and presents the design of a backbone, called CRbackbone, based on contextual joint representation learning, to extract richer contextual feature information while strengthening SAR target feature attributes. Furthermore, the design of a new cross-resolution attention-enhancement neck, called CAENeck, is presented to enhance the characterizability of multiscale SAR targets. The mAP of our method on the SSDD dataset attains 97.0% accuracy, reaching state-of-the-art levels. In addition, based on the HISEA-1 commercial SAR satellite, which has been launched into orbit and in whose development our research group participated, we released a larger-scale SAR multiclass target detection dataset, called SMCDD, which verifies the effectiveness of our method.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords