Factors Affecting Landslide Susceptibility Mapping: Assessing the Influence of Different Machine Learning Approaches, Sampling Strategies and Data Splitting

Minu Treesa Abraham; Neelima Satyam; Revuri Lokesh; Biswajeet Pradhan; Abdullah Alamri

doi:10.3390/land10090989

Land (Sep 2021)

Factors Affecting Landslide Susceptibility Mapping: Assessing the Influence of Different Machine Learning Approaches, Sampling Strategies and Data Splitting

Minu Treesa Abraham,
Neelima Satyam,
Revuri Lokesh,
Biswajeet Pradhan,
Abdullah Alamri

Affiliations

Minu Treesa Abraham: Department of Civil Engineering, Indian Institute of Technology Indore, Indore 453552, India
Neelima Satyam: Department of Civil Engineering, Indian Institute of Technology Indore, Indore 453552, India
Revuri Lokesh: Department of Civil Engineering, Indian Institute of Technology Indore, Indore 453552, India
Biswajeet Pradhan: Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney P.O. Box 123, Australia
Abdullah Alamri: Department of Geology & Geophysics, College of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia

DOI: https://doi.org/10.3390/land10090989
Journal volume & issue: Vol. 10, no. 9
p. 989

Abstract

Read online

Data driven methods are widely used for the development of Landslide Susceptibility Mapping (LSM). The results of these methods are sensitive to different factors, such as the quality of input data, choice of algorithm, sampling strategies, and data splitting ratios. In this study, five different Machine Learning (ML) algorithms are used for LSM for the Wayanad district in Kerala, India, using two different sampling strategies and nine different train to test ratios in cross validation. The results show that Random Forest (RF), K Nearest Neighbors (KNN), and Support Vector Machine (SVM) algorithms provide better results than Naïve Bayes (NB) and Logistic Regression (LR) for the study area. NB and LR algorithms are less sensitive to the sampling strategy and data splitting, while the performance of the other three algorithms is considerably influenced by the sampling strategy. From the results, both the choice of algorithm and sampling strategy are critical in obtaining the best suited landslide susceptibility map for a region. The accuracies of KNN, RF, and SVM algorithms have increased by 10.51%, 10.02%, and 4.98% with the use of polygon landslide inventory data, while for NB and LR algorithms, the performance was slightly reduced with the use of polygon data. Thus, the sampling strategy and data splitting ratio are less consequential with NB and algorithms, while more data points provide better results for KNN, RF, and SVM algorithms.

Published in Land

ISSN: 2073-445X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Agriculture
Website: http://www.mdpi.com/journal/land

About the journal

Abstract

Keywords