Water Practice and Technology (Jan 2024)
Assessment of resampling methods on performance of landslide susceptibility predictions using machine learning in Kendari City, Indonesia
Abstract
Landslide susceptibility projections that rely on independent models produce biased results. This situation will worsen class balance if working with a small population. This study proposes a landslide susceptibility prediction model based on resampling, cross-validation, bootstrap, and random subsampling approaches, which is integrated with the machine learning model, generalized linear model, support vector machine, random forest, boosted regression trees, classification and regression tree, multivariate adaptive regression splines, mixture discriminate analysis, flexible discriminant analysis, maximum entropy, and maximum likelihood. This methodology was applied in Kendari City, an urban area which faced destructive erosion. Area under the ROC curve (AUC), true skill statistics (TSS), correlation coefficient (COR), normalized mutual information (NMI), and correct classification rate (CCR) were used to evaluate the predictive accuracy of the proposed model. The results show that the resampling algorithm improves the performance of the standalone model. Results also revealed that standalone models had better performance with the CV algorithm compared to the Bt and RS algorithms. The Bt-RF model excels in statistical measures (AUC = 0.97, TSS = 0.97, COR = 0.99, NMI = 0.50, and CCR = 0.93). Given the admirable performance of the proposed models in a moderate scale area, promising results can be expected from these models for other regions. HIGHLIGHTS The resampling technique is often neglected in the pre-processing process.; The cross-validation technique is superior to other techniques (i.e. RS and Bt).; Distance from fault and rainfall factors correlate with the occurrence of landslides in the study area of this research.;
Keywords