IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2023)
PVT-SAR: An Arbitrarily Oriented SAR Ship Detector With Pyramid Vision Transformer
Abstract
The development of deep learning has significantly boosted the development of ship detection in synthetic aperture radar (SAR) images. Most previous works rely on the convolutional neural networks (CNNs), which extract characteristics through local receptive fields and are sensitive to noise. Moreover, these detectors have limited performance in large-scale and complex scenes due to the strong interference of inshore background and the variability of target imaging characteristics. In this article, a novel SAR ship detection framework is proposed, which establishes the pyramid vision transformer (PVT) paradigm for multiscale feature representations in SAR images and, hence, is referred to as PVT-SAR. It breaks the limitation of the CNN receptive field and captures the global dependence through the self-attention mechanism. Since the difficulties of object detection in SAR and natural images are quite different, directly applying the existing transformer structure, such as PVT-small, cannot achieve satisfactory performance for SAR object detection. Compared with the PVT, overlapping patch embedding and mixed transformer encoder modules are incorporated to overcome the problems of densely arranged targets and insufficient data. Then, a multiscale feature fusion module is designed to further improve the detection ability for small targets. Moreover, a normalized Gaussian Wasserstein distance loss is employed to suppress the influence of scattering interference at the ship's boundary. The superiority of the proposed PVT-SAR detector over several state-of-the-art-oriented bounding box detectors has been evaluated in both inshore and offshore scenes on two commonly used SAR ship datasets (i.e., RSSDD and HRSID).
Keywords