Earth System Science Data (Feb 2025)

A Sentinel-2 machine learning dataset for tree species classification in Germany

  • M. Freudenberg,
  • M. Freudenberg,
  • S. Schnell,
  • P. Magdon

DOI
https://doi.org/10.5194/essd-17-351-2025
Journal volume & issue
Vol. 17
pp. 351 – 367

Abstract

Read online

We present a machine learning dataset for tree species classification in Sentinel-2 satellite image time series of bottom-of-atmosphere reflectance. It is geared towards training classifiers but is less suitable for validating the resulting maps. The dataset is based on the German National Forest Inventory of 2012 as well as analysis-ready satellite imagery computed using the Framework for Operational Radiometric Correction for Environmental monitoring (FORCE) processing pipeline. From the National Forest Inventory data, we extracted the tree positions, filtered 387 775 trees in the upper canopy layer, and automatically extracted the corresponding bottom-of-atmosphere reflectance time series from Sentinel-2 L2A images. These time series are labeled with the corresponding tree species, which allows pixel-wise classification tasks. Furthermore, we provide auxiliary information such as the approximate tree position, the year of possible disturbance events, or the diameter at breast height. Temporally, the dataset spans the years from July 2015 to the end of October 2022, with approx. 75.3 million data points for trees of 48 species and 3 species groups as well as 13.8 million observations for non-tree backgrounds. Spatially, it covers the whole of Germany. The dataset is available at the following DOI (Freudenberg et al., 2024): https://doi.org/10.3220/DATA20240402122351-0.