Results in Engineering (Mar 2024)
Explainable machine learning methods for predicting water treatment plant features under varying weather conditions
Abstract
Accurately predicting key features in WWTPs is essential for optimizing plant performance and minimizing operational costs. This study assesses the potential of various machine learning models for predicting the inflow to anoxic sludge reactors. Firstly, it conducts a comprehensive evaluation of diverse machine learning models, including k-Nearest Neighbors (kNN), Random Forest (RF), XGBoost, CatBoost, LightGBM, and Decision Tree Regression (DTR), for predicting the flow into the Anoxic section under various weather conditions (dry, rainy, and stormy). Secondly, the study introduces parsimonious models guided by variable importance from the XGBoost algorithm. Furthermore, the study employs SHAP (SHapley Additive exPlanations) to elucidate model predictions, providing insights into the contribution of each feature. Data from the COST Benchmark Simulation Model (BSM1) is used to verify the investigated models' effectiveness. Each dataset consists of 14 days of influent data at 15-minute intervals, with 80% of the data used for model training. Results show that ensemble learning methods, particularly CatBoost and XGBoost, demonstrate satisfactory predictive results for Anoxic section flow in the presence of increased variability under rainy and stormy conditions. Notably, the CatBoost and XGBoost models achieve average Mean Absolute Percentage Error values of 1.33% and 1.59%, outperforming the other methods.