Ecological Indicators (Oct 2021)
Incorporation of high accuracy surface modeling into machine learning to improve soil organic matter mapping
Abstract
Digital soil mapping approaches related to soil organic matter (SOM) are crucial to quantify the process of the carbon cycle in terrestrial ecosystems and thus, can better manage soil fertility. Recently, many studies have compared machine learning (ML) models with traditional statistical models in digital soil mapping. However, few studies focused on the application of hybrid models that combine ML with statistical models to map SOM content, especially in loess areas, which have a complicated geomorphologic landscape. In this study, the trend prediction used two ML models, i.e., gradient boosting modeling and random forest (RF), and a traditional stepwise multiple linear regression plus interpolated residuals generated from two classic geostatistical models, i.e., ordinary kriging and inverse distance weighting, and a high accuracy surface modeling (HASM) were implemented to map SOM content in the Dongzhi Loess Tableland area of China. A total of 145 topsoil samples and heterogeneous environmental variables were collected to develop the hybrid models. Results showed that 18 variables related to soil properties, climate variables, terrain attributes, vegetation indices, and location attributes played an important role in SOM mapping. The models that incorporate ML algorithms and interpolated residuals to predict SOM variation were found to have a better ability to handle complex environment relationships. The HASM model outperformed traditional geostatistical models in interpolating the residuals. In contrast, RF combined with HASM residuals (RF_HASM) gave the best performance, with the lowest mean absolute error (1.69 g/kg), root mean square error (2.30 g/kg), and the highest coefficient of determination (0.57) and concordance correlation coefficient (0.69) values. Moreover, the spatial distribution pattern obtained with RF_HASM yielded a spatial distribution of SOM that better fit the actual distribution pattern of the study area. In conclusion, these results suggest that RF_HASM is particularly capable of improving the mapping accuracy of SOM content at the regional scale.