Hydrology and Earth System Sciences (Sep 2022)

Evaluation of water flux predictive models developed using eddy-covariance observations and machine learning: a meta-analysis

  • H. Shi,
  • H. Shi,
  • H. Shi,
  • H. Shi,
  • G. Luo,
  • G. Luo,
  • G. Luo,
  • G. Luo,
  • O. Hellwich,
  • M. Xie,
  • M. Xie,
  • M. Xie,
  • M. Xie,
  • C. Zhang,
  • C. Zhang,
  • Y. Zhang,
  • Y. Zhang,
  • Y. Wang,
  • Y. Wang,
  • X. Yuan,
  • X. Ma,
  • W. Zhang,
  • W. Zhang,
  • W. Zhang,
  • W. Zhang,
  • A. Kurban,
  • A. Kurban,
  • A. Kurban,
  • A. Kurban,
  • P. De Maeyer,
  • P. De Maeyer,
  • P. De Maeyer,
  • P. De Maeyer,
  • T. Van de Voorde,
  • T. Van de Voorde

DOI
https://doi.org/10.5194/hess-26-4603-2022
Journal volume & issue
Vol. 26
pp. 4603 – 4618

Abstract

Read online

With the rapid accumulation of water flux observations from global eddy-covariance flux sites, many studies have used data-driven approaches to model water fluxes, with various predictors and machine learning algorithms used. However, it is unclear how various model features affect prediction accuracy. To fill this gap, we evaluated this issue based on records of 139 developed models collected from 32 such studies. Support vector machines (SVMs; average R-squared = 0.82) and RF (random forest; average R-squared = 0.81) outperformed other evaluated algorithms with sufficient sample size in both cross-study and intra-study (with the same data) comparisons. The average accuracy of the model applied to arid regions is higher than in other climate types. The average accuracy of the model was slightly lower for forest sites (average R-squared = 0.76) than for croplands and grasslands (average R-squared = 0.8 and 0.79) but higher than for shrubland sites (average R-squared = 0.67). Using Rn/Rs, precipitation, Ta, and the fraction of absorbed photosynthetically active radiation (FAPAR) improved the model accuracy. The combined use of Ta and Rn/Rs is very effective, especially in forests, while in grasslands the combination of Ws and Rn/Rs is also effective. Random cross-validation showed higher model accuracy than spatial cross-validation and temporal cross-validation, but spatial cross-validation is more important in spatial extrapolation. The findings of this study are promising to guide future research on such machine-learning-based modeling.