Flood Susceptibility Assessment with Random Sampling Strategy in Ensemble Learning (RF and XGBoost)

Hancheng Ren; Bo Pang; Ping Bai; Gang Zhao; Shu Liu; Yuanyuan Liu; Min Li

doi:10.3390/rs16020320

Remote Sensing (Jan 2024)

Flood Susceptibility Assessment with Random Sampling Strategy in Ensemble Learning (RF and XGBoost)

Hancheng Ren,
Bo Pang,
Ping Bai,
Gang Zhao,
Shu Liu,
Yuanyuan Liu,
Min Li

Affiliations

Hancheng Ren: College of Water Sciences, Beijing Normal University, Beijing 100875, China
Bo Pang: College of Water Sciences, Beijing Normal University, Beijing 100875, China
Ping Bai: Kunming Flood Control and Drought Relief Headquarters Office, Kunming 650000, China
Gang Zhao: Institute of Industrial Science, University of Tokyo, Tokyo 153−8505, Japan
Shu Liu: China Institute of Water Resources and Hydropower Research, Beijing 100038, China
Yuanyuan Liu: China Institute of Water Resources and Hydropower Research, Beijing 100038, China
Min Li: China Institute of Water Resources and Hydropower Research, Beijing 100038, China

DOI: https://doi.org/10.3390/rs16020320
Journal volume & issue: Vol. 16, no. 2
p. 320

Abstract

Read online

Due to the complex interaction of urban and mountainous floods, assessing flood susceptibility in mountainous urban areas presents a challenging task in environmental research and risk analysis. Data-driven machine learning methods can evaluate flood susceptibility in mountainous urban areas lacking essential hydrological data, utilizing remote sensing data and limited historical inundation records. In this study, two ensemble learning algorithms, Random Forest (RF) and XGBoost, were adopted to assess the flood susceptibility of Kunming, a typical mountainous urban area prone to severe flood disasters. A flood inventory was created using flood observations from 2018 to 2022. The spatial database included 10 explanatory factors, encompassing climatic, geomorphic, and anthropogenic factors. Artificial Neural Network (ANN) and Support Vector Machine (SVM) were selected for model comparison. To minimize the influence of expert opinions on model training, this study employed a strategy of uniformly random sampling in historically non-flooded areas for negative sample selection. The results demonstrated that (1) ensemble learning algorithms offer higher accuracy than other machine learning methods, with RF achieving the highest accuracy, evidenced by an area under the curve (AUC) of 0.87, followed by XGBoost at 0.84, surpassing both ANN (0.83) and SVM (0.82); (2) the interpretability of ensemble learning highlighted the differences in the potential distribution of the training data’s positive and negative samples. Feature importance in ensemble learning can be utilized to minimize human bias in the collection of flooded-site samples, more targeted flood susceptibility maps of the study area’s road network were obtained; and (3) ensemble learning algorithms exhibited greater stability and robustness in datasets with varied negative samples, as evidenced by their performance in F1-Score, Kappa, and AUC metrics. This paper further substantiates the superiority of ensemble learning in flood susceptibility assessment tasks from the perspectives of accuracy, interpretability, and robustness, enhances the understanding of the impact of negative samples on such assessments, and optimizes the specific process for urban flood susceptibility assessment using data-driven methods.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords