Ecological Indicators (Oct 2022)
Soil total and organic carbon mapping and uncertainty analysis using machine learning techniques
Abstract
Soil carbon is the largest terrestrial carbon pool. Reliable mapping of soil organic carbon (SOC) and soil total carbon content (STC) is essential for agricultural ecosystem management and carbon accounting under global warming conditions. This study was conducted at a fine scale and aimed to perform spatial distribution prediction and uncertainty mapping of SOC and STC and quantify the contribution of environmental variables affecting the variability of SOC and STC. Three machine learning models, namely, Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Random Forest plus residuals Kriging (RFRK), were developed using 4345 agricultural topsoil samples and 16 environmental covariates. Mean absolute error (MAE), root mean square error (RMSE), Nash–Sutcliffe model efficiency coefficient (NSE), and Lin’s concordance correlation coefficient (LCCC) were used to evaluate the prediction global accuracy. Accuracy plot was used to quantify the uncertainty (i.e. local accuracy) of SOC and STC predictions. RFRK performed best with MAE, RMSE, NSE, and LCCC of 20.76%, 27.61%, 0.39, and 0.56 for SOC and 27.20%, 38.59%, 0.35, and 0.53 for STC, respectively. RF outperformed XGBoost in terms of NSE (0.33 vs 0.29 for SOC and 0.36 vs 0.32 for STC). Accuracy plots showed that RFRK produced higher local accuracy than RF both in quantifying the prediction uncertainty of SOC and STC. XGBoost performed excellently in the uncertainty estimation of SOC. Land use types, mean annual Normalized Difference Vegetation Index, and elevation were the top three important indicators in determining the spatial variability of SOC and STC. These results could provide inspiration and support for monitoring soil carbon in complex terrain areas.