IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
An Integrated Framework of Positive-Unlabeled and Imbalanced Learning for Landslide Susceptibility Mapping
Abstract
Machine learning is pivotal in data-driven landslide susceptibility mapping (LSM). However, the uncertainty of negative samples and the imbalance between positive and negative samples, which leads to misjudgments and overestimation, remain ongoing challenges. This study introduces a novel framework for LSM that integrates positive-unlabeled (PU) learning with imbalanced learning methods, making full and correct use of vast unlabeled samples. First, a prior model based on the spy algorithm is generated to obtain reliable negative (RN) samples, which is used to create imbalanced training and testing sets. Subsequently, four imbalanced learning models, namely synthetic minority oversampling technique-deep neural network (SMOTE-DNN), adaptive synthetic-DNN (ADASYN-DNN), balanced random forest (BRF), and EasyEnsemble (EE) are employed to process the imbalanced training and testing sets and generate the final prediction models. We have tested our LSM framework using a dataset of regional rainfall-induced landslides that occurred in Beijing, China. The positive impacts of RN samples are evaluated using baseline models and extensive saturation tests with various imbalance ratios are conducted. Imbalanced learning methods enhanced prediction for negative classes, with balance peaks observed in the saturation tests. BRF showed the best performance and stability across different imbalance ratios. This framework can improve the prediction accuracy for both positive and negative classes, which has the potential to reduce overestimation and misclassification and holds promise for significantly impacting future modeling strategies in LSM.
Keywords