DSANet: A Deep Supervision-Based Simple Attention Network for Efficient Semantic Segmentation in Remote Sensing Imagery

Wenxu Shi; Qingyan Meng; Linlin Zhang; Maofan Zhao; Chen Su; Tamás Jancsó

doi:10.3390/rs14215399

Remote Sensing (Oct 2022)

DSANet: A Deep Supervision-Based Simple Attention Network for Efficient Semantic Segmentation in Remote Sensing Imagery

Wenxu Shi,
Qingyan Meng,
Linlin Zhang,
Maofan Zhao,
Chen Su,
Tamás Jancsó

Affiliations

Wenxu Shi: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100049, China
Qingyan Meng: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100049, China
Linlin Zhang: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100049, China
Maofan Zhao: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100049, China
Chen Su: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100049, China
Tamás Jancsó: Alba Regia Technical Faculty, Obuda University, Budai ut 45, 8001 Szekesfehervar, Hungary

DOI: https://doi.org/10.3390/rs14215399
Journal volume & issue: Vol. 14, no. 21
p. 5399

Abstract

Read online

Semantic segmentation for remote sensing images (RSIs) plays an important role in many applications, such as urban planning, environmental protection, agricultural valuation, and military reconnaissance. With the boom in remote sensing technology, numerous RSIs are generated; this is difficult for current complex networks to handle. Efficient networks are the key to solving this challenge. Many previous works aimed at designing lightweight networks or utilizing pruning and knowledge distillation methods to obtain efficient networks, but these methods inevitably reduce the ability of the resulting models to characterize spatial and semantic features. We propose an effective deep supervision-based simple attention network (DSANet) with spatial and semantic enhancement losses to handle these problems. In the network, (1) a lightweight architecture is used as the backbone; (2) deep supervision modules with improved multiscale spatial detail (MSD) and hierarchical semantic enhancement (HSE) losses synergistically strengthen the obtained feature representations; and (3) a simple embedding attention module (EAM) with linear complexity performs long-range relationship modeling. Experiments conducted on two public RSI datasets (the ISPRS Potsdam dataset and Vaihingen dataset) exhibit the substantial advantages of the proposed approach. Our method achieves 79.19% mean intersection over union (mIoU) on the ISPRS Potsdam test set and 72.26% mIoU on the Vaihingen test set with speeds of 470.07 FPS on 512 × 512 images and 5.46 FPS on 6000 × 6000 images using an RTX 3090 GPU.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords