Ecological Informatics (Sep 2024)
Enhancing the streamflow simulation of a process-based hydrological model using machine learning and multi-source data
Abstract
Streamflow simulation is crucial for flood mitigation, ecological protection, and water resource planning. Process-based hydrological models and machine learning algorithms are the mainstream tools for streamflow simulation. However, their inherent limitations, such as time-consuming and large data requirements, make achieving high-precision simulations challenging. This study developed a hybrid approach to simultaneously improve the accuracy and computational efficiency of streamflow simulation, which integrates Block-wise use of the TOPMODEL (BTOP) model into the eXtreme Gradient Boosting (XGBoost), i.e., BTOP_XGB. In this approach, BTOP generates simulated streamflow using the Latin hypercube sampling algorithm instead of the time-consuming calibration algorithms to reduce computational costs. Then, XGBoost combines BTOP simulated streamflow with multi-source data to reduce simulation errors. In which, serval input variable selection algorithms are employed to choose relevant inputs and remove redundant information for model. The hybrid approach is validated and compared with a standalone model at three hydrological stations in the Jialing River basin, China. The results show that the performance of BTOP_XGB is significantly better than the BTOP and XGBoost models. The NSE of BTOP_XGB at Beibei, Xiaoheba, and Luoduxi stations increases by 54%, 21%, and 83%, respectively. Meanwhile, the computational time of BTOP_XGB is saved by >90% compared to the original calibrated BTOP. BTOP_XGB is less affected by parameter sample sizes and data amounts, demonstrating the robustness of the hybrid model. This study simplifies the complexity of the hydrological model and enhances the stability of machine learning, jointly improving the reliability of streamflow simulation. The hybrid approach provides a potential shortcut for streamflow simulation over basins with large areas or limited observed data.