FusionHeightNet: A Multi-Level Cross-Fusion Method from Multi-Source Remote Sensing Images for Urban Building Height Estimation

Chao Ma; Yueting Zhang; Jiayi Guo; Guangyao Zhou; Xiurui Geng

doi:10.3390/rs16060958

Remote Sensing (Mar 2024)

FusionHeightNet: A Multi-Level Cross-Fusion Method from Multi-Source Remote Sensing Images for Urban Building Height Estimation

Chao Ma,
Yueting Zhang,
Jiayi Guo,
Guangyao Zhou,
Xiurui Geng

Affiliations

Chao Ma: Key Laboratory of Technology in Geo-Spatial Information Processing and Application Systems, Beijing 100190, China
Yueting Zhang: Key Laboratory of Technology in Geo-Spatial Information Processing and Application Systems, Beijing 100190, China
Jiayi Guo: Key Laboratory of Technology in Geo-Spatial Information Processing and Application Systems, Beijing 100190, China
Guangyao Zhou: Key Laboratory of Technology in Geo-Spatial Information Processing and Application Systems, Beijing 100190, China
Xiurui Geng: Key Laboratory of Technology in Geo-Spatial Information Processing and Application Systems, Beijing 100190, China

DOI: https://doi.org/10.3390/rs16060958
Journal volume & issue: Vol. 16, no. 6
p. 958

Abstract

Read online

Extracting buildings in urban scenes from remote sensing images is crucial for the construction of digital cities, urban monitoring, urban planning, and autonomous driving. Traditional methods generally rely on shadow detection or stereo matching from multi-view high-resolution remote sensing images, which is cost-intensive. Recently, machine learning has provided solutions for the estimation of building heights from remote sensing images, but challenges remain due to the limited observation angles and image quality. The inherent lack of information in a single modality greatly limits the extraction precision. This article proposes an advanced method using multi-source remote sensing images for urban building height estimation, which is characterized by multi-level cross-fusion, the multi-task joint learning of footprint extraction and height estimation, and semantic information to refine the height estimation results. The complementary and effective features of synthetic aperture radar (SAR) and electro-optical (EO) images are transferred through multi-level cross-fusion. We use the semantic information of the footprint extraction branch to refine the height estimation results, enhancing the height results from coarse to fine. Finally, We evaluate our model on the SpaceNet 6 dataset and achieve 0.3849 and 0.7231 in the height estimation metric δ1 and footprint extraction metric Dice, respectively, which indicate effective improvements in the results compared to other methods.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords