Geomatics, Natural Hazards & Risk (Jan 2021)
Effect of spatial resolution and data splitting on landslide susceptibility mapping using different machine learning algorithms
Abstract
With the increasing computational facilities and data availability, machine learning (ML) models are gaining wide attention in landslide modeling. This study evaluates the effect of spatial resolution and data splitting, using five different ML algorithms (naïve bayes (NB), K nearest neighbors (KNN), logistic regression (LR), random forest (RF) and support vector machines (SVM)). The maps were developed using twelve landslide conditioning factors at two different resolutions, 12.5 m and 30 m. To identify the effect of data splitting on model performance, 2162 landslide points and an equal number of non-landslide points were used for training and testing the models using k-fold cross-validation, by varying the number of folds from two to ten. Results indicated that the spatial resolution of the dataset affects the performance of all the algorithms considered, while the effect of data splitting is significant in KNN and RF algorithms. All the algorithms yielded better performance while using the dataset with 12.5 m resolution for the same number of folds. It was also observed that the accuracy and area-under-the-curve values of 7, 8, 9, and 10-fold cross-validations with 30 m resolution was better than 2 and 3-fold cross-validations using 12.5 m resolution, in the case of RF algorithm.
Keywords