Remote Sensing (Oct 2024)
A Novel Strategy Coupling Optimised Sampling with Heterogeneous Ensemble Machine-Learning to Predict Landslide Susceptibility
Abstract
The accuracy of data-driven landslide susceptibility prediction depends heavily on the quality of non-landslide samples and the selection of machine-learning algorithms. Current methods rely on artificial prior knowledge to obtain negative samples from landslide-free regions or outside the landslide buffer zones randomly and quickly but often ignore the reliability of non-landslide samples, which will pose a serious risk of including potential landslides and lead to erroneous outcomes in training data. Furthermore, diverse machine-learning models exhibit distinct classification capabilities, and applying a single model can readily result in over-fitting of the dataset and introduce potential uncertainties in predictions. To address these problems, taking Chenxi County, a hilly and mountainous area in southern China, as an example, this research proposes a strategy-coupling optimised sampling with heterogeneous ensemble machine learning to enhance the accuracy of landslide susceptibility prediction. Initially, 21 landslide impact factors were derived from five aspects: geology, hydrology, topography, meteorology, human activities, and geographical environment. Then, these factors were screened through a correlation analysis and collinearity diagnosis. Afterwards, an optimised sampling (OS) method was utilised to select negative samples by fusing the reliability of non-landslide samples and certainty factor values on the basis of the environmental similarity and statistical model. Subsequently, the adopted non-landslide samples and historical landslides were combined to create machine-learning datasets. Finally, baseline models (support vector machine, random forest, and back propagation neural network) and the stacking ensemble model were employed to predict susceptibility. The findings indicated that the OS method, considering the reliability of non-landslide samples, achieved higher-quality negative samples than currently widely used sampling methods. The stacking ensemble machine-learning model outperformed those three baseline models. Notably, the accuracy of the hybrid OS–Stacking model is most promising, up to 97.1%. The integrated strategy significantly improves the prediction of landslide susceptibility and makes it reliable and effective for assessing regional geohazard risk.
Keywords