SOIL (Sep 2024)
An ensemble estimate of Australian soil organic carbon using machine learning and process-based modelling
Abstract
Spatially explicit prediction of soil organic carbon (SOC) serves as a crucial foundation for effective land management strategies aimed at mitigating soil degradation and assessing carbon sequestration potential. Here, using more than 1000 in situ observations, we trained two machine learning models (a random forest model and a k-means coupled with multiple linear regression model) and one process-based model (the vertically resolved MIcrobial-MIneral Carbon Stabilization, MIMICS, model) to predict the SOC stocks of the top 30 cm of soil in Australia. Parameters of MIMICS were optimised for different site groupings using two distinct approaches: plant functional types (MIMICS-PFT) and the most influential environmental factors (MIMICS-ENV). All models showed good performance with respect to SOC predictions, with an R2 value greater than 0.8 during out-of-sample validation, with random forest being the most accurate; moreover, it was found that SOC in forests is more predictable than that in non-forest soils excluding croplands. The performance of continental-scale SOC predictions by MIMICS-ENV is better than that by MIMICS-PFT especially in non-forest soils. Digital maps of terrestrial SOC stocks generated using all of the models showed a similar spatial distribution, with higher values in south-eastern and south-western Australia, but the magnitude of the estimated SOC stocks varied. The mean ensemble estimate of SOC stocks was 30.3 t ha−1, with k-means coupled with multiple linear regression generating the highest estimate (mean SOC stocks of 38.15 t ha−1) and MIMICS-PFT generating the lowest estimate (mean SOC stocks of 24.29 t ha−1). We suggest that enhancing process-based models to incorporate newly identified drivers that significantly influence SOC variation in different environments could be the key to reducing the discrepancies in these estimates. Our findings underscore the considerable uncertainty in SOC estimates derived from different modelling approaches and emphasise the importance of rigorous out-of-sample validation before applying any one approach in Australia.