Multi-task Learning of Semantic Segmentation and Height Estimation for Multi-modal Remote Sensing Images

Mengyu WANG, Zhiyuan YAN, Yingchao FENG, Wenhui DIAO, Xian SUN

doi:10.11947/j.jggs.2023.0403

Journal of Geodesy and Geoinformation Science (Dec 2023)

Multi-task Learning of Semantic Segmentation and Height Estimation for Multi-modal Remote Sensing Images

Mengyu WANG, Zhiyuan YAN, Yingchao FENG, Wenhui DIAO, Xian SUN

Affiliations

Mengyu WANG, Zhiyuan YAN, Yingchao FENG, Wenhui DIAO, Xian SUN: 1. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China;2. University of Chinese Academy of Sciences, Beijing 100190, China;3. School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100190, China;4. Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China

DOI: https://doi.org/10.11947/j.jggs.2023.0403
Journal volume & issue: Vol. 6, no. 4
pp. 27 – 39

Abstract

Read online

Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images. However, as more and more remote sensing data is available, it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation. In addition, semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation, but existing methods usually study individual tasks separately, which leads to high computational resource overhead. To this end, we propose a Multi-Task learning framework for Multi-Modal remote sensing images (MM_MT). Specifically, we design a Cross-Modal Feature Fusion (CMFF) method, which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation. Besides, a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation (JSSHE), extracting common features in a shared network to save time and resources, and then learning task-specific features in two task branches. Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently, multi-task learning saves 20% of training time and achieves competitive performance with mIoU of 83.02% for semantic segmentation and accuracy of 95.26% for height estimation.

|multi-modal|multi-task|semantic segmentation|height estimation|convolutional neural network

Published in Journal of Geodesy and Geoinformation Science

ISSN: 2096-5990 (Print); 2096-1650 (Online)
Publisher: Surveying and Mapping Press
Country of publisher: China
LCC subjects: Science: Astronomy: Geodesy
Website: http://jggs.chinasmp.com

About the journal

Abstract

Keywords