Ecological Informatics (Dec 2024)
Biplots for understanding machine learning predictions in digital soil mapping
Abstract
In digital soil mapping, machine learning is gradually replacing traditional statistical models because of their greater flexibility and better prediction performance. However, unlike traditional models, a notable drawback of machine learning models is that they are “black-box” in nature due to their limited ability to provide comprehensive interpretations for their predictions. Explainable machine learning (XML) methods provide visualisations that can be used to aid in understanding predictions made by machine learning models. Popular model-agnostic visualisation methods include partial dependence plots, independent conditional expectation curves, and partial dependence plots produced with Shapley values. These methods require that covariates are uncorrelated which could be restrictive. For cases where covariates are correlated, an alternative approach is the Accumulated Local Effect plot, which however is limited to depicting one or two covariates at a time. Another disadvantage of the above mentioned methods is that no readily available goodness-of-fit metric is available. In this paper we propose the use of a principal component analysis biplot as a model-agnostic method to gain insight into machine learning predictions in digital soil mapping. A biplot is a powerful visualisation tool that is used to seek patterns in multivariate data. A biplot does not require covariates included in the visualisation to be uncorrelated, and furthermore, an analytically derived goodness-of-fit metric is provided which allows the user to evaluate the accuracy of the approximation. We present examples from a case study in South Africa in which soil organic carbon is mapped with a random forest model. Our findings show that biplots can provide meaningful interpretations for predictions, making it a worthy addition to the XML toolkit.