IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

Cross-Scene Building Identification Based on Dual-Stream Neural Network and Efficient Channel Attention Mechanism

  • Wenmei Li,
  • Jiadong Zhang,
  • Hao Xia,
  • Qing Liu,
  • Yu Wang,
  • Yan Jia,
  • Yixiang Chen

DOI
https://doi.org/10.1109/JSTARS.2024.3375321
Journal volume & issue
Vol. 17
pp. 6920 – 6932

Abstract

Read online

With the widespread popularity of deep learning, various neural network models are extensively employed in the recognition, classification, and segmentation of remote sensing images. Convolutional neural networks (CNNs), fully convolutional networks, including their variants like Unet, have demonstrated significant results within this particular domain. Nevertheless, CNNs exhibit limitations when it comes to grasping extended global dependencies. Conversely, transformers exhibit exceptional ability in effectively dealing with long-range dependencies. Considering this, we have introduced the efficient channel attention-enhanced dual-stream neural network (ECA-DSNN) to improve the identification of buildings across various scenes. Specifically, we developed a dual-stream network that incorporates the Unet and transformer framework in order to capture both the local and global context. In addition, we introduced an attention mechanism module to augment the model's generalization capability. With the advanced identification and generalization capability of ECA-DSNN, only fine-tuning and data augmentation are needed to achieve superior performance in cross-scene transfer, even with limited samples in the target domain. The outcomes indicated that the ECA-DSNN proposed achieved superior performance in comparison to the state-of-the-art methodologies, particularly in the experiment transferring from the source domain Beijing to the target domain Shanghai. In this scenario, the overall accuracy surpassed 96.3% and an F1 score exceeded 78.6%.

Keywords