Journal of Hydroinformatics (Jul 2022)
Insights into enhanced machine learning techniques for surface water quantity and quality prediction based on data pre-processing algorithms
Abstract
Quality and quantity of streamflow are crucial components in the management and control of water resources according which are challenging due to their nonstationarity and uncertainty path. This paper presented an ensemble data pre-processing-based machine learning (ML) algorithm for the decision-support of water resource management and water pollution control at the watershed scale due to the nonlinear path of streamflow. In the proposed hybrid model, a new time–frequency analysis algorithm, variational mode decomposition (VMD), is implemented to deal with the nonlinearity and nonstationary of a streamflow process. The VMD is exploited to decompose the original water quality and quantity series into a series of intrinsic mode functions (IMFs) with different frequencies. Therefore, an ensemble algorithm, bootstrap aggregating (bagging) algorithm is coupled with two common ML, reduced error pruning tree (REPT) and random forest (RF), to predict all the decomposed modes using VMD. Then, in order to reduce the variance among the base classifiers of the proposed ML, a bootstrap aggregation technique was recruited. Finally, the predicting value of the original water quality and quantity series is obtained by adding up the predicting results of all the decomposed modes. The proposed hybrid decomposition–ensemble model has been applied to two stations in Karoon River, Iran. Results obtained from this study indicate that the proposed hybrid decomposition–ensemble model can capture the nonlinear characteristics of a streamflow process in terms of water quality and quantity simultaneously and thus provide more accurate predicting results compared with those models without data frequency decomposing. HIGHLIGHTS This study has applied machine learning (ML) techniques simultaneously for modelling river water quality.; A new ensemble ML technique has been developed to assess and predict two important water quality and quantity parameters.; The bagging ensemble algorithm is used with ML techniques, where it significantly raises the stability of models in improving accuracy.; A VMD data pre-processing technique is recommended to enhance the model's fidelity.;
Keywords