DEUFormer: High‐precision semantic segmentation for urban remote sensing images

Xinqi Jia; Xiaoyong Song; Lei Rao; Guangyu Fan; Songlin Cheng; Niansheng Chen

doi:10.1049/cvi2.12313

IET Computer Vision (Dec 2024)

DEUFormer: High‐precision semantic segmentation for urban remote sensing images

Xinqi Jia,
Xiaoyong Song,
Lei Rao,
Guangyu Fan,
Songlin Cheng,
Niansheng Chen

Affiliations

Xinqi Jia: Shanghai Dianji University Shanghai China
Xiaoyong Song: Shanghai Dianji University Shanghai China
Lei Rao: Shanghai Dianji University Shanghai China
Guangyu Fan: Shanghai Dianji University Shanghai China
Songlin Cheng: Shanghai Dianji University Shanghai China
Niansheng Chen: Shanghai Dianji University Shanghai China

DOI: https://doi.org/10.1049/cvi2.12313
Journal volume & issue: Vol. 18, no. 8
pp. 1209 – 1222

Abstract

Read online

Abstract Urban remote sensing image semantic segmentation has a wide range of applications, such as urban planning, resource exploration, intelligent transportation, and other scenarios. Although UNetFormer performs well by introducing the self‐attention mechanism of Transformer, it still faces challenges arising from relatively low segmentation accuracy and significant edge segmentation errors. To this end, this paper proposes DEUFormer by employing a special weighted sum method to fuse the features of the encoder and the decoder, thus capturing both local details and global context information. Moreover, an Enhanced Feature Refinement Head is designed to finely re‐weight features on the channel dimension and narrow the semantic gap between shallow and deep features, thereby enhancing multi‐scale feature extraction. Additionally, an Edge‐Guided Context Module is introduced to enhance edge areas through effective edge detection, which can improve edge information extraction. Experimental results show that DEUFormer achieves an average Mean Intersection over Union (mIoU) of 53.8% on the LoveDA dataset and 69.1% on the UAVid dataset. Notably, the mIoU of buildings in the LoveDA dataset is 5.0% higher than that of UNetFormer. The proposed model outperforms methods such as UNetFormer on multiple datasets, which demonstrates its effectiveness.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords