Applied Sciences (Jul 2024)
Construction and Optimization of Landslide Susceptibility Assessment Model Based on Machine Learning
Abstract
The appropriate selection of machine learning samples forms the foundation for utilizing machine learning models. However, in landslide susceptibility evaluation, discrepancies arise when non-landslide samples are positioned within areas prone to landslides or demonstrate spatial biases, leading to differences in model predictions. To address the impact of non-landslide sample selection on landslide susceptibility predictions, this study uses the western region of Henan Province as a case study. Utilizing historical data, remote sensing interpretation, and field surveys, a sample dataset comprising 834 landslide points is obtained. Ten environmental factors, including elevation, slope, aspect, profile curvature, land cover, lithology, topographic wetness index, distance from river, distance from faults, and distance from road, are chosen to establish an evaluation index system. Negative sample sampling areas are delineated based on the susceptibility assessment outcomes derived from the information value model. Two sampling strategies, whole-region random sampling (I) and partition-based random sampling (II), are employed. Random Forest (RF) and Back Propagation Neural Network (BPNN) models are used to forecast and delineate landslide susceptibility in the western region of Henan Province, with prediction accuracy evaluated. The model prediction accuracy is ranked as follows: II-BPNN (AUC = 0.9522) > II-RF (AUC = 0.9464) > I-RF (AUC = 0.8247) > I-BPNN (AUC = 0.8068). Under the Receiver Operating Characteristic (AUC) curve and accuracy, the II-RF and II-BPNN models exhibit increases in the region by 12.17% and 15.61%, respectively, compared to the I-RF and I-BPNN models. Moreover, the II-BPNN model shows improvements over the I-BPNN model with increases in AUC and accuracy by 14.54% and 16.52%, respectively. This indicates enhancements in model performance and predictive capability. In terms of recall and specificity, the II-RF and II-BPNN models demonstrate increases in recall by 15.09% and 17.47%, respectively, and in specificity by 15.80% and 14.99%, respectively. These findings suggest that the optimized models have better predictive capabilities for identifying landslide and non-landslide areas, effectively reducing the uncertainty introduced by point data in landslide risk prediction.
Keywords