Journal of Hydrology: Regional Studies (Apr 2024)
A novel framework for feature simplification and selection in flood susceptibility assessment based on machine learning
Abstract
Study region: Yangtze River Delta core urban agglomeration, China Study focus: Traditional research on flood susceptibility assessment using machine learning often seeks to enhance model performance by increasing the number of input variables, which is impractical in regions with limited data availability. In this study, we constructed a variable system comprising 13 features for flood susceptibility assessment through machine learning techniques. A flexible framework, primarily incorporating methods for importance value calculation and repeated random sampling, were established to identify a minimal set of features that yield high-performance classifiers. Finally, the feasibility of the proposed framework was verified by comparing the classifier performances and flood susceptibility maps. New hydrological insights for the region: Results underscored the significance of features such as Land Use / Land Cover, Impervious Area, Normalized Difference Vegetation Index, Distance to Lake and Built-up Probability in model development. These five features proved sufficient to produce a classifier with Area Under the Curve (AUC) indices exceeding 0.9 for both training and testing data. Susceptibility maps generated using varying feature counts revealed that regions with limited vegetation cover and near lakes face higher flood susceptibility. The framework's feasibility and viability were confirmed by the excellent classifier performance (mean AUC > 0.9) with reduced features and the consistent outcomes of generated maps, offering theoretical and technical support for flooding research in data-constrained regions.