IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

Adaptive Dual-Stream Sparse Transformer Network for Salient Object Detection in Optical Remote Sensing Images

  • Jie Zhao,
  • Yun Jia,
  • Lin Ma,
  • Lidan Yu

DOI
https://doi.org/10.1109/JSTARS.2024.3365729
Journal volume & issue
Vol. 17
pp. 5173 – 5192

Abstract

Read online

Excellent performance has been demonstrated by convolutional neural network (CNN) in salient object detection for optical remote sensing images (ORSI-SOD). However, the limitations of CNN's feature extraction using sliding window approach hinder the capture of global representations. Therefore, an end-to-end detection model, known as adaptive dual-stream sparse transformer network (ADSTNet), has been proposed for ORSI-SOD and is assisted by the vision transformer. It effectively addresses the compensation issue of global and local information in ORSI-SOD. In particular, an adaptive interaction encoder has been devised, amalgamating the multiscale sparse transformer and the pyramid atrous attention to constitute the adaptive dual-stream sparse encoder. This encoder collaborates with the CNN to enhance long-range dependency modeling and preserve global information more effectively base on local features. In addition, a directional feature reconfiguration is constructed to extract texture details from multiple directional dimensions. Finally, we propose the adaptive feature cascade decoder that synthesizes content information from the foreground, edges, and background to enhance the representational capacity of the image. Furthermore, a structural loss function, known as the weight compensation mechanism, is introduced to balance the performance of boundary and salmap segmentation losses. The proposed model has been demonstrated to outperform 26 state-of-the-art ORSI-SOD methods across eight evaluation metrics on two standard datasets, as evidenced by extensive experiments. Furthermore, to verify its robustness, the generalization performance of the model on the latest challenging ORSI-4199 dataset is reported.

Keywords