Evaluating Landslide Susceptibility Using Sampling Methodology and Multiple Machine Learning Models

Yingze Song; Degang Yang; Weicheng Wu; Xin Zhang; Jie Zhou; Zhaoxu Tian; Chencan Wang; Yingxu Song

doi:10.3390/ijgi12050197

ISPRS International Journal of Geo-Information (May 2023)

Evaluating Landslide Susceptibility Using Sampling Methodology and Multiple Machine Learning Models

Yingze Song,
Degang Yang,
Weicheng Wu,
Xin Zhang,
Jie Zhou,
Zhaoxu Tian,
Chencan Wang,
Yingxu Song

Affiliations

Yingze Song: Institute of Computer and Information Science, Chongqing Normal University, Chongqing 400047, China
Degang Yang: Institute of Computer and Information Science, Chongqing Normal University, Chongqing 400047, China
Weicheng Wu: Key Laboratory for Digital Land and Resources of Jiangxi Province, East China University of Technology, Nanchang 330013, China
Xin Zhang: Institute of Computer and Information Science, Chongqing Normal University, Chongqing 400047, China
Jie Zhou: Institute of Computer and Information Science, Chongqing Normal University, Chongqing 400047, China
Zhaoxu Tian: Institute of Computer and Information Science, Chongqing Normal University, Chongqing 400047, China
Chencan Wang: Institute of Computer and Information Science, Chongqing Normal University, Chongqing 400047, China
Yingxu Song: Key Laboratory for Digital Land and Resources of Jiangxi Province, East China University of Technology, Nanchang 330013, China

DOI: https://doi.org/10.3390/ijgi12050197
Journal volume & issue: Vol. 12, no. 5
p. 197

Abstract

Read online

Landslide susceptibility assessment (LSA) based on machine learning methods has been widely used in landslide geological hazard management and research. However, the problem of sample imbalance in landslide susceptibility assessment, where landslide samples tend to be much smaller than non-landslide samples, is often overlooked. This problem is often one of the important factors affecting the performance of landslide susceptibility models. In this paper, we take the Wanzhou district of Chongqing city as an example, where the total number of data sets is more than 580,000 and the ratio of positive to negative samples is 1:19. We oversample or undersample the unbalanced landslide samples to make them balanced, and then compare the performance of machine learning models with different sampling strategies. Three classic machine learning algorithms, logistic regression, random forest and LightGBM, are used for LSA modeling. The results show that the model trained directly using the unbalanced sample dataset performs the worst, showing an extremely low recall rate, indicating that its predictive ability for landslide samples is extremely low and cannot be applied in practice. Compared with the original dataset, the sample set optimized through certain methods has demonstrated improved predictive performance across various classifiers, manifested in the improvement of AUC value and recall rate. The best model was the random forest model using over-sampling (O_RF) (AUC = 0.932).

Published in ISPRS International Journal of Geo-Information

ISSN: 2220-9964 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Geography. Anthropology. Recreation: Geography (General)
Website: http://www.mdpi.com/journal/ijgi

About the journal

Abstract

Keywords