Water Supply (Apr 2024)
Prediction of microbiological non-compliances using a Boosted Regression Trees model: application on the drinking water distribution system of a whole country
Abstract
Universal access to safe drinking water is a fundamental human right and a requirement for a healthy life. Therefore, monitoring the quality of the supplied water is of utmost importance. To achieve this goal, there is a need to develop tools that support monitoring activities and improve efficiency. Forecasting models enable the prediction of pollution levels and facilitate the implementation of action plans. In this study, the Boosted Regression Trees method was employed to investigate the variables influencing water quality failures (WQFs) due to microbial contamination at the delivery point. The dataset used was obtained from localities across the country's distribution systems. The variables under consideration included physicochemical parameters such as pH, turbidity (NTU), and free chlorine (mg L−1), along with contextual parameters like the year, season, geographic location, and locality population. Indicators of microbial contamination assessed were the presence of total coliforms, Escherichia coli, and Pseudomonas aeruginosa. The most significant variables were geographic location, free chlorine content, and the population of the locality. The model achieved an AUC value of 0.77 and provided adequate predictions in the conducted tests. It enables the exploration of key factors affecting microbiological water quality, allowing for informed action to reduce associated risks. HIGHLIGHTS Boosted Regression Trees were employed to study the variables that influence water quality failures due to microbial contamination at the delivery point. Both posed greatest risk to the public.; Drinking water suppliers can use this tool to improve their monitoring plans and public authorities can use this input to implement actions for preventing water contamination and to improve water safety plans.;
Keywords