مجله آب و خاک (Dec 2021)

Modeling the Vertical Soil Calcium Carbonate Equivalent Variation by Machine Learning Algorithms in Qazvin Plain

  • S.R. Mousavi,
  • F. Sarmadian,
  • M. Omid,
  • P. Bogaert

DOI
https://doi.org/10.22067/jsw.2021.71748.1076
Journal volume & issue
Vol. 35, no. 5
pp. 719 – 734

Abstract

Read online

Introduction: Calcium Carbonate Equivalent (CCE) is one of the key soils properties in arid and semi-arid regions. The study of spatial variability of surface and subsurface layers is important in the sustainable land management of arable soils. This study aimed to model the spatial distribution of CCE percentage by using three machine learning algorithms including Random Forest (RF), Decision Tree regression (DTr) and k-Nearest Neighbor (k-NN) at five standard depths of 0-5, 5-15, 15-30, 30-60, and 60-100 cm.Material and Methods: The study area with 60,000 ha includes the major part of the lands of Qazvin plain located on the border of Qazvin and Alborz provinces. Field and laboratory surveys included 278 representative profiles were excavated, described by the horizon, and determined physicochemical properties. The studied soils have a very high diversity in soil moisture (Aridic, Xeric, and Aquic) and temperature regimes (Thermic). These variations have led to the formation of eight great groups of soils in the region based in the USDA soil classification system with the three classes of Haploxerepts, Calcixerepts, and Haplocalcids were the dominant soil classes in the study area. A total of 22 environmental covariates, including 12 variables extracted from the primary and secondary derivation of digital elevation model (DEM), six remote sensing (RS) indicators, two climatic parameters, and two soil covariates were prepared, and then the most appropriate environmental covariates were selected using principal component analysis (PCA) and expert knowledge. The CCE percentage data were randomly divided into two parts, 80% for training and 20% for testing, which was then modeled by three machine learning algorithms RF, DTr, and k-NN, and were evaluated by some statistical indices as coefficient determination (R2), root mean square error (RMSE) and Bias.Results and Discussion: The results of harmonizing the CCE values at the genetic horizons with the standard depths showed the high efficiency of the spline depth function in providing an acceptable estimate with minimum error and maximum agreement between observed and predicted values. The PCA method showed that the first to fifth components with the explanation of more than 80% of cumulative variance were Multi-Resolution Index of Valley Bottom Flatness (MrVBF), Mean Annual Temperature (MAT), Greenness index (Greenness), Probability of Calcic horizon (Cal.hr), and Wind Effect environmental covariates which had the highest eigenvalues. Besides, Clay was selected on expert knowledge-based. The relative importance (RI) of the environmental covariates showed the spatial distribution of CCE were affected by Clay with an explanation of more than 57%, 41.8% and 45% of its variance at three surface depths of 0-5, 5-15, and 15-30 cm, while the Cal.hr covariate had the highest impact in the spatial prediction of CCE compared to other predictors as auxiliary variables with 67.8% and 52.8% justification, respectively, at two depths of 30-60 and 60-100 cm. Hence, using the calcic horizon probability Map (Cal.hr) as a derivative soil factor made it possible to produce more appropriate final maps, while preventing the reduction of the accuracy of the modeling results in the subsoils. The auxiliary variable of remote sensing, i.e., Greenness, could not show a significant impact on the expression of the variation of CCE percentage at all studied depths. Unlike remote sensing indices, the topographic attribute of the MrVBF, at two standard depths of 0-5 and 5-15 cm, the MAT at a depth of 15-30 cm, and the Wind Effect at the standard depths 30-60 and 60-100 cm, after the soil covariates, were the most effective in justifying the spatial variations of CCE%. RF algorithm with a range of R2 values of 0.83 - 0.76 and RMSE of 2.14% - 2.21% resulted in the highest accuracy and minimum error. Even though the DTr method presented R2 values (0.52-0.39) weaker than the RF in the validation dataset, in general, the results of its spatial predictions were similar to the RF model from the surface to the subsurface and more stable than the k-NN. Against RF and DTr, k-NN couldn’t display acceptable performance in the prediction of CCE% at all standardized depths.Conclusion: In general, it is necessary to understand the spatial distribution of CCE due to its effect on soil moisture accessibility and plant nutrient uptake. Therefore, in the present study, we tried to introduce the RF machine learning algorithm as a superior model with environmental variables that were selected by PCA and the expert knowledge variable selection method. The maps prepared by this approach have an acceptable level of reliability for agricultural and environmental management by managers, soil experts, and farmers.

Keywords