Ecological Indicators (Feb 2023)
Prediction modelling framework comparative analysis of dissolved oxygen concentration variations using support vector regression coupled with multiple feature engineering and optimization methods: A case study in China
Abstract
Dissolved oxygen (DO) is an essential indicator for assessing water quality and managing aquatic environments, but it is still a challenging topic to accurately understand and predict the spatiotemporal variation of DO concentrations under the complex effects of different environmental factors. In this study, a practical prediction framework was proposed for DO concentrations based on the support vector regression (SVR) model coupling multiple intelligence techniques (i.e., four data denoising techniques, three feature selection rules, and four hyperparameter optimization methods). The holistic framework was tested using a data matrix (17,532 observation data in total) of 12 indicators from three vital water quality monitoring stations of the longest inter-basin water diversion project in the world (i.e., the Middle-Route of the South-to-North Water Diversion Project of China), during the year 2017 to 2020 period. The results showed that the framework we advocated for could successfully and accurately predict DO concentration variations in different geographical locations. The model used the “wavelet analysis–LASSO regression–random search–SVR” combination of the Waihuanhe station has the best prediction performance, with the Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE), and coefficient of determination (R2) values of 0.251, 0.063, 0.190, and 0.911, respectively. The combined methods using feature selection and hyperparameter optimization techniques can significantly promote the robustness and accuracy of the prediction model and can provide a new universal and practical way of investigating and understanding the environmental drivers of DO concentration variations. For the water quality management department, this proposed comprehensive framework can also identify and reveal the key parameters that should be concerned and monitored under different environmental factors change. More studies in terms of assessing potential integrated water quality risk using multi-indicators in mega water diversion projects and/or similar water bodies are required in the future.