Effect of spatial resolution and data splitting on landslide susceptibility mapping using different machine learning algorithms

Minu Treesa Abraham; Neelima Satyam; Prashita Jain; Biswajeet Pradhan; Abdullah Alamri

doi:10.1080/19475705.2021.2011791

Geomatics, Natural Hazards & Risk (Jan 2021)

Effect of spatial resolution and data splitting on landslide susceptibility mapping using different machine learning algorithms

Minu Treesa Abraham,
Neelima Satyam,
Prashita Jain,
Biswajeet Pradhan,
Abdullah Alamri

Affiliations

Minu Treesa Abraham: Department of Civil Engineering, Indian Institute of Technology Indore
Neelima Satyam: Department of Civil Engineering, Indian Institute of Technology Indore
Prashita Jain: Department of Civil Engineering, Indian Institute of Technology Indore
Biswajeet Pradhan: Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), School of Civil and Environmental Engineering, Faculty of Engineering and Information Technology, University of Technology Sydney
Abdullah Alamri: Department of Geology and Geophysics, College of Science, King Saud University

DOI: https://doi.org/10.1080/19475705.2021.2011791
Journal volume & issue: Vol. 12, no. 1
pp. 3381 – 3408

Abstract

Read online

With the increasing computational facilities and data availability, machine learning (ML) models are gaining wide attention in landslide modeling. This study evaluates the effect of spatial resolution and data splitting, using five different ML algorithms (naïve bayes (NB), K nearest neighbors (KNN), logistic regression (LR), random forest (RF) and support vector machines (SVM)). The maps were developed using twelve landslide conditioning factors at two different resolutions, 12.5 m and 30 m. To identify the effect of data splitting on model performance, 2162 landslide points and an equal number of non-landslide points were used for training and testing the models using k-fold cross-validation, by varying the number of folds from two to ten. Results indicated that the spatial resolution of the dataset affects the performance of all the algorithms considered, while the effect of data splitting is significant in KNN and RF algorithms. All the algorithms yielded better performance while using the dataset with 12.5 m resolution for the same number of folds. It was also observed that the accuracy and area-under-the-curve values of 7, 8, 9, and 10-fold cross-validations with 30 m resolution was better than 2 and 3-fold cross-validations using 12.5 m resolution, in the case of RF algorithm.

Published in Geomatics, Natural Hazards & Risk

ISSN: 1947-5705 (Print); 1947-5713 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Technology: Environmental technology. Sanitary engineering; Geography. Anthropology. Recreation: Environmental sciences; Social Sciences: Industries. Land use. Labor: Management. Industrial management: Risk in industry. Risk management
Website: https://www.tandfonline.com/journals/tgnh

About the journal

Abstract

Keywords