Water Science and Technology (May 2024)

Prediction of flood sensitivity based on Logistic Regression, eXtreme Gradient Boosting, and Random Forest modeling methods

  • Ying Wu,
  • Zhiming Zhang,
  • Xiaotian Qi,
  • Wenhan Hu,
  • Shuai Si

DOI
https://doi.org/10.2166/wst.2024.146
Journal volume & issue
Vol. 89, no. 10
pp. 2605 – 2624

Abstract

Read online

Floods are one of the most destructive disasters that cause loss of life and property worldwide every year. In this study, the aim was to find the best-performing model in flood sensitivity assessment and analyze key characteristic factors, the spatial pattern of flood sensitivity was evaluated using three machine learning (ML) models: Logistic Regression (LR), eXtreme Gradient Boosting (XGBoost), and Random Forest (RF). Suqian City in Jiangsu Province was selected as the study area, and a random sample dataset of historical flood points was constructed. Fifteen different meteorological, hydrological, and geographical spatial variables were considered in the flood sensitivity assessment, 12 variables were selected based on the multi-collinearity study. Among the results of comparing the selected ML models, the RF method had the highest AUC value, accuracy, and comprehensive evaluation effect, and is a reliable and effective flood risk assessment model. As the main output of this study, the flood sensitivity map is divided into five categories, ranging from very low to very high sensitivity. Using the RF model (i.e., the highest accuracy of the model), the high-risk area covers about 44% of the study area, mainly concentrated in the central, eastern, and southern parts of the old city area. HIGHLIGHTS Correlation analysis and multi-collinearity analysis were performed to screen flood risk factors by VIF value.; Multi-model evaluation: the RF model is superior to LR and XGBoost models.; Stochastic forest algorithm was used to explore the importance of flood risk factors.; Generation of flood sensitivity maps: providing intuitive, clear information to help decision makers more effectively identify high-risk areas.;

Keywords