International Journal of Applied Earth Observations and Geoinformation (Aug 2024)

Hierarchical building use classification from multiple modalities with a multi-label multimodal transformer network

  • Wen Zhou,
  • Claudio Persello,
  • Alfred Stein

Journal volume & issue
Vol. 132
p. 104038

Abstract

Read online

Building use information is important for urban planning, city digital twins, and informed policy formulation. Prior research has predominantly focused on mapping building use in broad categories, offering general insight into their actual use. Our study investigates the extraction of hierarchical building categories, encompassing both broad and detailed classifications while accounting for mixed-use. To achieve this, we explore the fusion of building function information from satellite images, digital surface models (DSM), street view images, and point of interest (POI) data. We propose a novel multi-label multimodal transformer-based feature fusion network, which is capable of simultaneously predicting four broad categories and 13 detailed categories. Experimental results demonstrate the efficacy of our method, as it maps most of the building use categories, with the weighted average F1 score for four broad categories and 13 detailed categories of 91% and 77%, respectively. Our experiments underscore the critical role of satellite images in building use classification, with the inclusion of DSM data and POI significantly enhancing the classification accuracy. By considering detailed use categories and accounting for mixed-use, our method provides more detailed insights into land use patterns, thereby contributing to urban planning and management.

Keywords