SSDT: Scale-Separation Semantic Decoupled Transformer for Semantic Segmentation of Remote Sensing Images

Chengyu Zheng; Yanru Jiang; Xiaowei Lv; Jie Nie; Xinyue Liang; Zhiqiang Wei

doi:10.1109/JSTARS.2024.3383066

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

SSDT: Scale-Separation Semantic Decoupled Transformer for Semantic Segmentation of Remote Sensing Images

Chengyu Zheng,
Yanru Jiang,
Xiaowei Lv,
Jie Nie,
Xinyue Liang,
Zhiqiang Wei

Affiliations

Chengyu Zheng: ORCiD; College of Information Science and Engineering, Ocean University of China, Qingdao, China
Yanru Jiang: ORCiD; College of Information Science and Engineering, Ocean University of China, Qingdao, China
Xiaowei Lv: ORCiD; College of Information Science and Engineering, Ocean University of China, Qingdao, China
Jie Nie: ORCiD; College of Information Science and Engineering, Ocean University of China, Qingdao, China
Xinyue Liang: ORCiD; College of Information Science and Engineering, Ocean University of China, Qingdao, China
Zhiqiang Wei: ORCiD; College of Information Science and Engineering, Ocean University of China, Qingdao, China

DOI: https://doi.org/10.1109/JSTARS.2024.3383066
Journal volume & issue: Vol. 17
pp. 9037 – 9052

Abstract

Read online

As we all know, semantic segmentation of remote sensing (RS) images is to classify the images pixel by pixel to realize the semantic decoupling of the images. Most traditional semantic decoupling methods only decouple and do not perform scale-separation operations, which leads to serious problems. In the semantic decoupling process, if the feature extractor is too large, it will ignore the small-scale targets; if the feature extractor is too small, it will lead to the separation of large-scale target objects and reduce the segmentation accuracy. To address this concern, we propose a scale-separated semantic decoupled transformer (SSDT), which first performs scale-separation in the semantic decoupling process and uses the obtained scale information-rich semantic features to guide the Transformer to extract features. The network consists of five modules, scale-separated patch extraction (SPE), semantic decoupled transformer (SDT), scale-separated feature extraction (SFE), semantic decoupling (SD), and multiview feature fusion decoder (MFFD). In particular, SPE turns the original image into a linear embedding sequence of three scales; SD divides pixels into different semantic clusters by K-means, and further obtains scale information-rich semantic features; SDT improves the intraclass compactness and interclass looseness by calculating the similarity between semantic features and image features, the core of which is decoupled attention. Finally, MFFD is proposed to fuse salient features from different perspectives to further enhance the feature representation. Our experiments on two large-scale fine-resolution RS image datasets (Vaihingen and Potsdam) demonstrate the effectiveness of the proposed SSDT strategy in RS image semantic segmentation tasks.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords