Energy and AI (Sep 2024)
Leveraging machine learning to generate a unified and complete building height dataset for Germany
Abstract
Building geometry data is crucial for detailed, spatially-explicit analyses of the building stock in energy systems analysis and beyond. Despite the existence of diverse datasets and methods, a standardized and validated approach for creating a nation-wide unified and complete dataset of German building heights is not yet available. This study develops and validates such a methodology, combining different data sources for building footprints and heights and filling gaps in height data using an XGBoost machine learning algorithm. The XGBoost model achieves a mean absolute error of 1.78 m at the national level and between 1.52 m and 3.47 m at the federal state level. The goal is proving the applicability of the methodology at a large scale and creating a useful dataset. The resulting dataset is thoroughly evaluated on a building-by-building level and spatially resolved statistics on the quality of the dataset are reported. This detailed validation found that the building number and footprint area of German building stock is 90.31 % and 94.84 % correct, respectively, and the building height accuracy is 0.59 m at the national level. However, errors are not homogeneous across Germany and further research is needed into the impact of including additional datasets, especially for regions and building types with lower accuracies. This study proves that the chosen methodology is useful for generating a building height dataset and the workflow, with some modifications for regional data availability, can be transferred to other countries. The generated building dataset for Germany constitutes a valuable data basis for the research community in fields such as energy research, urban planning and building decarbonization policy development.