Engineering Applications of Computational Fluid Mechanics (Dec 2024)
Explainable machine learning models for estimating daily dissolved oxygen concentration of the Tualatin River
Abstract
ABSTRACTMonitoring the quality of river water is of fundamental importance and needs to be taken into consideration when it comes to the research into the hydrological field. In this context, the concentration of the dissolved oxygen (DO) is one of the most significant indicators of the quality of river water. The current study aimed to estimate the minimum, maximum, and mean DO concentrations (DO min, DO max, DO mean) at a gauging station located on Tualatin River, United States. To that end, four machine learning models, such as support vector regression (SVR), multi-layer perceptron (MLP), random forest (RF), and gradient boosting (GB) were established. Root mean square error (RMSE), mean absolute error (MAE), coefficient of correlation (R), and Nash-Sutcliffe efficiency (NSE) metrics were employed to better assess the accuracies of these models. The modeling results demonstrated that the SVR and MLP surpassed the RF and GB models. Despite this, the SVR was concluded to be the best-performing method when used to estimate the DO min, DO max, and DO mean. The best error statistics in the testing phase were related to the SVR model with full (four) inputs to estimate DO mean concentration (RMSE = 0.663 mg/l, MAE = 0.508 mg/l, R = 0.945, NSE = 0.875). Finally, the explainability of the superior models (i.e. SVR models) was conducted using SHapley Additive exPlanations (SHAP) for the first time to estimate DO concentration. In fact, evaluating the explainability of machine learning models can provide useful information about the impact of each of the input estimators used in the procedure of models development. It was concluded that the specific conductance (SC) and followed by water temperature (WT) could provide the most contributions for estimating the DO min, DO max, and DO mean concentrations.
Keywords