Geoscientific Model Development (Feb 2022)

A new methodological framework for geophysical sensor combinations associated with machine learning algorithms to understand soil attributes

  • D. C. D. Mello,
  • G. V. Veloso,
  • M. G. D. Lana,
  • F. A. D. O. Mello,
  • R. R. Poppiel,
  • D. R. O. Cabrero,
  • L. A. D. L. Di Raimo,
  • C. E. G. R. Schaefer,
  • E. I. F. Filho,
  • E. P. Leite,
  • J. A. M. Demattê

DOI
https://doi.org/10.5194/gmd-15-1219-2022
Journal volume & issue
Vol. 15
pp. 1219 – 1246

Abstract

Read online

Geophysical sensors combined with machine learning algorithms were used to understand the pedosphere system and landscape processes and to model soil attributes. In this research, we used parent material, terrain attributes, and data from geophysical sensors in different combinations to test and compare different and novel machine learning algorithms to model soil attributes. We also analyzed the importance of pedoenvironmental variables in predictive models. For that, we collected soil physicochemical and geophysical data (gamma-ray emission from uranium, thorium, and potassium; magnetic susceptibility and apparent electric conductivity) by three sensors (gamma-ray spectrometer, RS 230; susceptibilimeter KT10, Terraplus; and conductivimeter, EM38 Geonics) at 75 points and analyzed the data. The models with the best performance (R2 0.48, 0.36, 0.44, 0.36, 0.25, and 0.31) varied for clay, sand, Fe2O3, TiO2, SiO2, and cation exchange capacity prediction, respectively. Modeling with the selection of covariates at three phases (variance close to zero, removal by correction, and removal by importance) was adequate to increase the parsimony. The results were validated using the method “nested leave-one-out cross-validation”. The prediction of soil attributes by machine learning algorithms yielded adequate values for field-collected data, without any sample preparation, for most of the tested predictors (R2 values ranging from 0.20 to 0.50). Also, the use of four regression algorithms proved to be important since at least one of the predictors used one of the tested algorithms. The performance values of the best algorithms for each predictor were higher than those obtained with the use of a mean value for the entire area comparing the values of root mean square error (RMSE) and mean absolute error (MAE). The best combination of sensors that reached the highest model performance was that of the gamma-ray spectrometer and the susceptibilimeter. The most important variables for most predictions were parent material, digital elevation, standardized height, and magnetic susceptibility. We concluded that soil attributes can be efficiently modeled by geophysical data using machine learning techniques and geophysical sensor combinations. This approach can facilitate future soil mapping in a more time-efficient and environmentally friendly manner.