IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

Progressive Complementation Network With Semantics and Details for Salient Object Detection in Optical Remote Sensing Images

  • Rundong Zhao,
  • Panpan Zheng,
  • Cui Zhang,
  • Liejun Wang

DOI
https://doi.org/10.1109/JSTARS.2024.3387442
Journal volume & issue
Vol. 17
pp. 8626 – 8641

Abstract

Read online

The existing salient object detection in optical remote sensing images methods mostly employ the same strategy to handle features at different levels without fully considering the distinct characteristics these features possess at various levels. This results in the neglect of some high-level semantics and low-level details during the feature extraction process. Furthermore, existing methods often rely on simple convolution operations to construct modules for feature extraction and fusion. Due to the inherent locality of convolution operations, these models are limited in their performance. To address these challenges, we propose a novel progressive complementation network with semantics and details (SDPCNet) consisting of three parts: Deep semantics aggregation module (DSAM), semantics-guided feature complement module (SFCM), and detail feature enhancement module (DFEM). Specifically, the DSAM is applied on the two highest-level features, guided by the global view with global long-range dependencies and local context generated by transformer and dilated convolution. The DSAM deeply delves the semantic information in high-level features to perceive the object positions and alleviate the adverse effects of cluttered backgrounds. The SFCM operates on the intermediate two levels of features, performing global correlation modeling on the aggregated cross-level features. It enhances multiscale semantic information and edge details using multiple sets of dilated convolutions to address the challenges posed by the uncertainty in the size and number of salient objects. The DFEM acts on the lowest two levels of features, enhancing edge details in spatial dimension and emphasizing semantics in different channel dimensions. It is then fused with high-level features to augment feature diversity and reduce the impact of background noise. Extensive experiments conducted on the ORSSD, EORSSD, and ORSI-4199 datasets demonstrate that our proposed SDPCNet outperforms 23 state-of-the-art methods across eight evaluation metrics.

Keywords