Journal of Geodesy and Geoinformation Science (Dec 2023)
Multi-task Learning of Semantic Segmentation and Height Estimation for Multi-modal Remote Sensing Images
Abstract
Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images. However, as more and more remote sensing data is available, it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation. In addition, semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation, but existing methods usually study individual tasks separately, which leads to high computational resource overhead. To this end, we propose a Multi-Task learning framework for Multi-Modal remote sensing images (MM_MT). Specifically, we design a Cross-Modal Feature Fusion (CMFF) method, which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation. Besides, a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation (JSSHE), extracting common features in a shared network to save time and resources, and then learning task-specific features in two task branches. Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently, multi-task learning saves 20% of training time and achieves competitive performance with mIoU of 83.02% for semantic segmentation and accuracy of 95.26% for height estimation.
Keywords