Using Shapley additive explanations to interpret extreme gradient boosting predictions of grassland degradation in Xilingol, China

Batunacun; Batunacun; R. Wieland; T. Lakes; T. Lakes; C. Nendel; C. Nendel

doi:10.5194/gmd-14-1493-2021

Geoscientific Model Development (Mar 2021)

Using Shapley additive explanations to interpret extreme gradient boosting predictions of grassland degradation in Xilingol, China

Batunacun,
Batunacun,
R. Wieland,
T. Lakes,
T. Lakes,
C. Nendel,
C. Nendel

Affiliations

Batunacun: Department of Geography, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
Batunacun: Leibniz Centre for Agricultural Landscape Research (ZALF), Eberswalder Straße 84, 15374 Müncheberg, Germany
R. Wieland: Leibniz Centre for Agricultural Landscape Research (ZALF), Eberswalder Straße 84, 15374 Müncheberg, Germany
T. Lakes: Department of Geography, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
T. Lakes: Integrative Research Institute on Transformations of Human-Environment Systems, Humboldt-Universität zu Berlin, Friedrichstraße 191, 10099 Berlin, Germany
C. Nendel: Leibniz Centre for Agricultural Landscape Research (ZALF), Eberswalder Straße 84, 15374 Müncheberg, Germany
C. Nendel: Integrative Research Institute on Transformations of Human-Environment Systems, Humboldt-Universität zu Berlin, Friedrichstraße 191, 10099 Berlin, Germany

DOI: https://doi.org/10.5194/gmd-14-1493-2021
Journal volume & issue: Vol. 14
pp. 1493 – 1510

Abstract

Read online

Machine learning (ML) and data-driven approaches are increasingly used in many research areas. Extreme gradient boosting (XGBoost) is a tree boosting method that has evolved into a state-of-the-art approach for many ML challenges. However, it has rarely been used in simulations of land use change so far. Xilingol, a typical region for research on serious grassland degradation and its drivers, was selected as a case study to test whether XGBoost can provide alternative insights that conventional land-use models are unable to generate. A set of 20 drivers was analysed using XGBoost, involving four alternative sampling strategies, and SHAP (Shapley additive explanations) to interpret the results of the purely data-driven approach. The results indicated that, with three of the sampling strategies (over-balanced, balanced, and imbalanced), XGBoost achieved similar and robust simulation results. SHAP values were useful for analysing the complex relationship between the different drivers of grassland degradation. Four drivers accounted for 99 % of the grassland degradation dynamics in Xilingol. These four drivers were spatially allocated, and a risk map of further degradation was produced. The limitations of using XGBoost to predict future land-use change are discussed.

Published in Geoscientific Model Development

ISSN: 1991-959X (Print); 1991-9603 (Online)
Publisher: Copernicus Publications
Country of publisher: Germany
LCC subjects: Science: Geology
Website: https://www.geoscientific-model-development.net/

About the journal