Remote Sensing (Aug 2018)

Machine Learning Using Hyperspectral Data Inaccurately Predicts Plant Traits Under Spatial Dependency

  • Alby D. Rocha,
  • Thomas A. Groen,
  • Andrew K. Skidmore,
  • Roshanak Darvishzadeh,
  • Louise Willemen

DOI
https://doi.org/10.3390/rs10081263
Journal volume & issue
Vol. 10, no. 8
p. 1263

Abstract

Read online

Spectral, temporal and spatial dimensions are difficult to model together when predicting in situ plant traits from remote sensing data. Therefore, machine learning algorithms solely based on spectral dimensions are often used as predictors, even when there is a strong effect of spatial or temporal autocorrelation in the data. A significant reduction in prediction accuracy is expected when algorithms are trained using a sequence in space or time that is unlikely to be observed again. The ensuing inability to generalise creates a necessity for ground-truth data for every new area or period, provoking the propagation of “single-use” models. This study assesses the impact of spatial autocorrelation on the generalisation of plant trait models predicted with hyperspectral data. Leaf Area Index (LAI) data generated at increasing levels of spatial dependency are used to simulate hyperspectral data using Radiative Transfer Models. Machine learning regressions to predict LAI at different levels of spatial dependency are then tuned (determining the optimum model complexity) using cross-validation as well as the NOIS method. The results show that cross-validated prediction accuracy tends to be overestimated when spatial structures present in the training data are fitted (or learned) by the model.

Keywords