Ecological Indicators (Jun 2021)
A method to avoid spatial overfitting in estimation of grassland above-ground biomass on the Tibetan Plateau
Abstract
Accurate assessments of grassland above-ground biomass (AGB) are crucial for the sustainable utilization and protection of grassland resources and the eco-environment. In this study, a random forest (RF) model combined with the forward feature selection (FFS) and leave-location-out cross-validation (LLO-CV) methods was trained to predict the dry weight (DW) of grassland AGB based on multiple factors. The final model exhibited a performance of R2 = 0.66, root mean square error (RMSE) of 503.86 kg DW/ha and mean absolute error (MAE) of 376.51 kg DW/ha. The spatial distribution of grassland AGB increased from northwest to southeast over the entire Tibetan Plateau (TP) from 2001 to 2018. Grassland AGB increased more than it decreased (70.6% vs 29.4%, respectively) during the study period. Using a combination of FFS and LLO-CV, spatial overfitting was reduced, and the predictive accuracy of the RF was improved, thus enhancing the ability to predict the AGB in unknown locations from training data. This study proposes a robust methodology with which to improve the transferability of machine learning algorithms to predict grassland AGB in unknown locations.