Ecological Informatics (Sep 2024)

Explainable machine learning-based fractional vegetation cover inversion and performance optimization – A case study of an alpine grassland on the Qinghai-Tibet Plateau

  • Xinhong Li,
  • Jianjun Chen,
  • Zizhen Chen,
  • Yanping Lan,
  • Ming Ling,
  • Qinyi Huang,
  • Hucheng Li,
  • Xiaowen Han,
  • Shuhua Yi

Journal volume & issue
Vol. 82
p. 102768

Abstract

Read online

Fractional Vegetation Cover (FVC) serves as a crucial indicator in ecological sustainability and climate change monitoring. While machine learning is the primary method for FVC inversion, there are still certain shortcomings in feature selection, hyperparameter tuning, underlying surface heterogeneity, and explainability. Addressing these challenges, this study leveraged extensive FVC field data from the Qinghai-Tibet Plateau. Initially, a feature selection algorithm combining genetic algorithms and XGBoost was proposed. This algorithm was integrated with the Optuna tuning method, forming the GA-OP combination to optimize feature selection and hyperparameter tuning in machine learning. Furthermore, comparative analyses of various machine learning models for FVC inversion in alpine grassland were conducted, followed by an investigation into the impact of the underlying surface heterogeneity on inversion performance using the NDVI Coefficient of Variation (NDVI-CV). Lastly, the SHAP (Shapley Additive exPlanations) method was employed for both global and local interpretations of the optimal model. The results indicated that: (1) GA-OP combination exhibited favorable performance in terms of computational cost and inversion accuracy, with Optuna demonstrating significant potential in hyperparameter tuning. (2) Stacking model achieved optimal performance in FVC inversion for alpine grassland among the seven models (R2 = 0.867, RMSE = 0.12, RPD = 2.552, BIAS = −0.0005, VAR = 0.014), with the performance ranking as follows: Stacking > CatBoost > XGBoost > LightGBM > RFR > KNN > SVR. (3) NDVI-CV enhanced inversion performance and result reliability by excluding data from highly heterogeneous regions that tended to be either overestimated or underestimated. (4) SHAP revealed the decision-making processes of the Stacking and CatBoost models from both global and local perspectives. This allowed for a deeper exploration of the causality between features and targets. This study developed a high-precision FVC inversion scheme, successfully achieving accurate FVC inversion on the Qinghai-Tibet Plateau. The proposed approach provides valuable references for other ecological parameter inversions.

Keywords