Geoscience Frontiers (Nov 2020)

Gully erosion spatial modelling: Role of machine learning algorithms in selection of the best controlling factors and modelling process

  • Hamid Reza Pourghasemi,
  • Nitheshnirmal Sadhasivam,
  • Narges Kariminejad,
  • Adrian L. Collins

Journal volume & issue
Vol. 11, no. 6
pp. 2207 – 2219

Abstract

Read online

This investigation assessed the efficacy of 10 widely used machine learning algorithms (MLA) comprising the least absolute shrinkage and selection operator (LASSO), generalized linear model (GLM), stepwise generalized linear model (SGLM), elastic net (ENET), partial least square (PLS), ridge regression, support vector machine (SVM), classification and regression trees (CART), bagged CART, and random forest (RF) for gully erosion susceptibility mapping (GESM) in Iran. The location of 462 previously existing gully erosion sites were mapped through widespread field investigations, of which 70% (323) and 30% (139) of observations were arbitrarily divided for algorithm calibration and validation. Twelve controlling factors for gully erosion, namely, soil texture, annual mean rainfall, digital elevation model (DEM), drainage density, slope, lithology, topographic wetness index (TWI), distance from rivers, aspect, distance from roads, plan curvature, and profile curvature were ranked in terms of their importance using each MLA. The MLA were compared using a training dataset for gully erosion and statistical measures such as RMSE (root mean square error), MAE (mean absolute error), and R-squared. Based on the comparisons among MLA, the RF algorithm exhibited the minimum RMSE and MAE and the maximum value of R-squared, and was therefore selected as the best model. The variable importance evaluation using the RF model revealed that distance from rivers had the highest significance in influencing the occurrence of gully erosion whereas plan curvature had the least importance. According to the GESM generated using RF, most of the study area is predicted to have a low (53.72%) or moderate (29.65%) susceptibility to gully erosion, whereas only a small area is identified to have a high (12.56%) or very high (4.07%) susceptibility. The outcome generated by RF model is validated using the ROC (Receiver Operating Characteristics) curve approach, which returned an area under the curve (AUC) of 0.985, proving the excellent forecasting ability of the model. The GESM prepared using the RF algorithm can aid decision-makers in targeting remedial actions for minimizing the damage caused by gully erosion.

Keywords