Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation

Youda Mo; Huihui Li; Xiangling Xiao; Huimin Zhao; Xiaoyong Liu; Jin Zhan

doi:10.1109/JSTARS.2023.3280365

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2023)

Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation

Youda Mo,
Huihui Li,
Xiangling Xiao,
Huimin Zhao,
Xiaoyong Liu,
Jin Zhan

Affiliations

Youda Mo: ORCiD; School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China
Huihui Li: ORCiD; School of Computer Science and Guangdong Provincial Key Laboratory of Intellectual Property and Big Data, Guangdong Polytechnic Normal University, Guangzhou, China
Xiangling Xiao: ORCiD; School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China
Huimin Zhao: ORCiD; School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China
Xiaoyong Liu: ORCiD; School of Data Science and Engineering, Guangdong Polytechnic Normal University, Guangzhou, China
Jin Zhan: ORCiD; School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China

DOI: https://doi.org/10.1109/JSTARS.2023.3280365
Journal volume & issue: Vol. 16
pp. 5284 – 5296

Abstract

Read online

Compared with the traditional method based on hand-crafted features, deep neural network has achieved a certain degree of success on remote sensing (RS) image semantic segmentation. However, there are still serious holes, rough edge segmentation, and false detection or even missed detection due to the light and its shadow in the segmentation. Aiming at the above problems, this article proposes a RS semantic segmentation model SCG-TransNet that is a hybrid model of Swin transformer and Deeplabv3+, which includes Swin-Conv-Dspp (SCD) and global local transformer block (GLTB). First, the SCD module which can efficiently extract feature information from objects at different scales is used to mitigate the hole phenomenon, reducing the loss of detailed information. Second, we construct a GLTB with spatial pyramid pooling shuffle module to extract critical detail information from the limited visible pixels of the occluded objects, which alleviates the problem of difficult object recognition due to occlusion effectively. Finally, the experimental results show that our SCG-TransNet achieves a mean intersection over union of 70.29$\%$ on the Vaihingen datasets, which is 3$\%$ higher than the baseline model. It also achieved good results on POSDAM datasets. These demonstrate the effectiveness, robustness, and superiority of our proposed method compared with existing state-of-the-art methods.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords