Scientific Data (Mar 2025)
FloodCastBench: A Large-Scale Dataset and Foundation Models for Flood Modeling and Forecasting
Abstract
Abstract Effective flood forecasting is crucial for informed decision-making and emergency response. Existing flood datasets mainly describe flood events but lack dynamic process data suitable for machine learning (ML). This work introduces the FloodCastBench dataset, designed for ML-based flood modeling and forecasting, featuring four major flood events: Pakistan 2022, UK 2015, Australia 2022, and Mozambique 2019. FloodCastBench details the process of flood dynamics data acquisition, starting with input data preparation (e.g., topography, land use, rainfall) and flood measurement data collection (e.g., SAR-based maps, surveyed outlines) for hydrodynamic modeling. We deploy a widely recognized finite difference numerical solution to construct high-resolution spatiotemporal dynamic processes with 30-m spatial and 300-second temporal resolutions. Flood measurement data are used to calibrate the hydrodynamic model parameters and validate the flood inundation maps. FloodCastBench provides comprehensive low-fidelity and high-fidelity flood forecasting datasets specifically for ML. Furthermore, we establish a benchmark of foundational models for neural flood forecasting using FloodCastBench, validating its effectiveness in supporting ML models for spatiotemporal, cross-regional, and downscaled flood forecasting.