Biplots for understanding machine learning predictions in digital soil mapping

Stephan van der Westhuizen; Gerard B.M. Heuvelink; Sugnet Gardner-Lubbe; Catherine E. Clarke

Ecological Informatics (Dec 2024)

Biplots for understanding machine learning predictions in digital soil mapping

Stephan van der Westhuizen,
Gerard B.M. Heuvelink,
Sugnet Gardner-Lubbe,
Catherine E. Clarke

Affiliations

Stephan van der Westhuizen: Department of Statistics and Actuarial Science, Stellenbosch University, Stellenbosch, South Africa; Soil Geography and Landscape Group, Wageningen University, Wageningen, The Netherlands; Centre for Multi-dimensional Data Visualisation (MuViSU), Stellenbosch University, Stellenbosch, South Africa; Corresponding author at: Department of Statistics and Actuarial Science, Stellenbosch University, Stellenbosch, South Africa.
Gerard B.M. Heuvelink: Soil Geography and Landscape Group, Wageningen University, Wageningen, The Netherlands; ISRIC-World Soil Information, Wageningen, The Netherlands
Sugnet Gardner-Lubbe: Department of Statistics and Actuarial Science, Stellenbosch University, Stellenbosch, South Africa; Centre for Multi-dimensional Data Visualisation (MuViSU), Stellenbosch University, Stellenbosch, South Africa
Catherine E. Clarke: Department of Soil Science, Stellenbosch University, Stellenbosch, South Africa

Journal volume & issue: Vol. 84
p. 102892

Abstract

Read online

In digital soil mapping, machine learning is gradually replacing traditional statistical models because of their greater flexibility and better prediction performance. However, unlike traditional models, a notable drawback of machine learning models is that they are “black-box” in nature due to their limited ability to provide comprehensive interpretations for their predictions. Explainable machine learning (XML) methods provide visualisations that can be used to aid in understanding predictions made by machine learning models. Popular model-agnostic visualisation methods include partial dependence plots, independent conditional expectation curves, and partial dependence plots produced with Shapley values. These methods require that covariates are uncorrelated which could be restrictive. For cases where covariates are correlated, an alternative approach is the Accumulated Local Effect plot, which however is limited to depicting one or two covariates at a time. Another disadvantage of the above mentioned methods is that no readily available goodness-of-fit metric is available. In this paper we propose the use of a principal component analysis biplot as a model-agnostic method to gain insight into machine learning predictions in digital soil mapping. A biplot is a powerful visualisation tool that is used to seek patterns in multivariate data. A biplot does not require covariates included in the visualisation to be uncorrelated, and furthermore, an analytically derived goodness-of-fit metric is provided which allows the user to evaluate the accuracy of the approximation. We present examples from a case study in South Africa in which soil organic carbon is mapped with a random forest model. Our findings show that biplots can provide meaningful interpretations for predictions, making it a worthy addition to the XML toolkit.

Published in Ecological Informatics

ISSN: 1574-9541 (Print); 1878-0512 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Biology (General): Ecology
Website: https://www.sciencedirect.com/journal/ecological-informatics

About the journal

Abstract

Keywords