IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2025)

SARFA-Net: Shape-Aware Label Assignment and Refined Feature Alignment for Arbitrary-Oriented Object Detection in Remote Sensing Images

  • Yan Dong,
  • Minghong Wei,
  • Guangshuai Gao,
  • Chunlei Li,
  • Zhoufeng Liu

DOI
https://doi.org/10.1109/JSTARS.2025.3532039
Journal volume & issue
Vol. 18
pp. 8865 – 8881

Abstract

Read online

Arbitrary-oriented object detection in remote sensing images has witnessed significant progress in recent years. Numerous excellent detection models perform promising results, however, there are two main tough challenges hinder their performances. On the one hand, current label assignment strategies suffer from an imbalance between positive and negative samples, particularly for large aspect ratio and small-scale objects, leading to the Insufficient High-quality Samples. On the other hand, fixed convolution kernels and coarse sampling positions are not well suited for adapting to rotating objects in complex remote sensing scenes, resulting in Feature Misalignment. To alleviate the above issues, in this article, a novel SARFA-Net is proposed, incorporating a Shape-Aware Label Assignment (SALA) strategy and Refined Feature Alignment module (RFAM). Specifically, SALA is proposed to mitigate the problem of insufficient sampling for extremely shaped objects, the core of which is the Shape-Aware Sampling module, to meticulously select more high-quality positive samples within elliptical regions. To further enhance SALA at extremely limited scales and large aspect ratios, a Threshold Compensation Module is designed, which further utilizes the shape characteristics of the objects. Furthermore, RFAM is developed to adaptively align features by adjusting the sampling positions of the convolution kernels based on the refined anchors. Extensive experiments conducted on five large-scale datasets, DIOR-R, DOTA-v1.0, HRSC2016, FAIR1M-v1.0, and UCAS-AOD achieved mAPs of 68.90%, 80.09%, 90.40%, 46.34%, and 90.01%, respectively, demonstrating the effectiveness of the proposed approach and the superiority compared with state-of-the-arts. Compared with the baseline ${\text{S}^{\text{2}}}\text{A-Net}$, we have improved by 1.30, 1.57, 0.23, 5.92, and 0.37 points, respectively, without additional data augmentation.

Keywords