STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation

Liang Gao; Hui Liu; Minhang Yang; Long Chen; Yaling Wan; Zhengqing Xiao; Yurong Qian

doi:10.1109/jstars.2021.3119654

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2021)

STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation

Liang Gao,
Hui Liu,
Minhang Yang,
Long Chen,
Yaling Wan,
Zhengqing Xiao,
Yurong Qian

Affiliations

Liang Gao: ORCiD; College of Software, Xinjiang University, Urumqi, China
Hui Liu: College of Information Science and Engineering, Xinjiang University, Urumqi, China
Minhang Yang: College of Software, Xinjiang University, Urumqi, China
Long Chen: ORCiD; College of Software, Xinjiang University, Urumqi, China
Yaling Wan: College of Software, Xinjiang University, Urumqi, China
Zhengqing Xiao: College of Mathematics and Systems Science, Xinjiang University, Urumqi, China
Yurong Qian: College of Software, Xinjiang University, Urumqi, China

DOI: https://doi.org/10.1109/jstars.2021.3119654
Journal volume & issue: Vol. 14
pp. 10990 – 11003

Abstract

Read online

The applied research in remote sensing images has been pushed by convolutional neural network (CNN). Because of the fixed size of the perceptual field, CNN is unable to model global semantic relevance. Modeling global semantic information is possible with the self-attentive Transformer-based model. However, the method of patch computation used by Transformer for self-attentive computation ignores the spatial information inside each patch. To address these issues, we offer the STransFuse model as a new semantic segmentation method for remote sensing images. It is a model that combines the benefits of Transformer with CNN to improve the segmentation quality of various remote sensing images. We employ a staged model to extract coarse-grained and fine-grained feature representations at various semantic scales, unlike earlier techniques based on Transformer model fusion. In order to take full advantage of the features acquired at different stages, we designed an adaptive fusion module. This module adaptively fuses the semantic information between features at different scales employing a self-attentive mechanism. The overall accuracy (OA) of our proposed model on the Vaihingen dataset is 1.36% higher than the baseline, and 1.27% improvement in OA over baseline on the Potsdam dataset. When compared to other advanced models, the STransFuse model performs admirably.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords