Biogeosciences (Jun 2024)

From simple labels to semantic image segmentation: leveraging citizen science plant photographs for tree species mapping in drone imagery

  • S. Soltani,
  • S. Soltani,
  • S. Soltani,
  • O. Ferlian,
  • O. Ferlian,
  • N. Eisenhauer,
  • N. Eisenhauer,
  • H. Feilhauer,
  • H. Feilhauer,
  • H. Feilhauer,
  • H. Feilhauer,
  • T. Kattenborn,
  • T. Kattenborn

DOI
https://doi.org/10.5194/bg-21-2909-2024
Journal volume & issue
Vol. 21
pp. 2909 – 2935

Abstract

Read online

Knowledge of plant species distributions is essential for various application fields, such as nature conservation, agriculture, and forestry. Remote sensing data, especially high-resolution orthoimages from unoccupied aerial vehicles (UAVs), paired with novel pattern-recognition methods, such as convolutional neural networks (CNNs), enable accurate mapping (segmentation) of plant species. Training transferable pattern-recognition models for species segmentation across diverse landscapes and data characteristics typically requires extensive training data. Training data are usually derived from labor-intensive field surveys or visual interpretation of remote sensing images. Alternatively, pattern-recognition models could be trained more efficiently with plant photos and labels from citizen science platforms, which include millions of crowd-sourced smartphone photos and the corresponding species labels. However, these pairs of citizen-science-based photographs and simple species labels (one label for the entire image) cannot be used directly for training state-of-the-art segmentation models used for UAV image analysis, which require per-pixel labels for training (also called masks). Here, we overcome the limitation of simple labels of citizen science plant observations with a two-step approach. In the first step, we train CNN-based image classification models using the simple labels and apply them in a moving-window approach over UAV orthoimagery to create segmentation masks. In the second phase, these segmentation masks are used to train state-of-the-art CNN-based image segmentation models with an encoder–decoder structure. We tested the approach on UAV orthoimages acquired in summer and autumn at a test site comprising 10 temperate deciduous tree species in varying mixtures. Several tree species could be mapped with surprising accuracy (mean F1 score =0.47). In homogenous species assemblages, the accuracy increased considerably (mean F1 score =0.55). The results indicate that several tree species can be mapped without generating new training data and by only using preexisting knowledge from citizen science. Moreover, our analysis revealed that the variability in citizen science photographs, with respect to acquisition data and context, facilitates the generation of models that are transferable through the vegetation season. Thus, citizen science data may greatly advance our capacity to monitor hundreds of plant species and, thus, Earth's biodiversity across space and time.