Water Supply (Sep 2021)
Analysis of water quality indices and machine learning techniques for rating water pollution: a case study of Rawal Dam, Pakistan
Abstract
Water Quality Index (WQI) is a unique and effective rating technique for assessing the quality of water. Nevertheless, most of the indices are not applicable to all water types as these are dependent on core physico-chemical water parameters that can make them biased and sensitive towards specific attributes including: (i) time, location and frequency for data sampling; (ii) number, variety and weights allocation of parameters. Therefore, there is a need to evaluate these indices to eliminate uncertainties that make them unpredictable and which may lead to manipulation of the water quality classes. The present study calculated five WQIs for two temporal periods: (i) June to December 2019 obtained in real time (using the Internet of Things (IoT) nodes) at inlet and outlet streams of Rawal Dam; (ii) 2012–2019 obtained from the Rawal Dam Water Filtration Plant, collected through GIS-based grab sampling. The computed WQIs categorized the collected datasets as ‘Very Poor’, primarily owing to the uneven distribution of the water samples that has led to class imbalance in the data. Additionally, this study investigates the classification of water quality using machine learning algorithms namely: Decision Tree (DT), k-Nearest Neighbor (KNN), Logistic Regression (LogR), Multilayer Perceptron (MLP) and Naive Bayes (NB); based on the parameters including: pH, dissolved oxygen, conductivity, turbidity, fecal coliform and temperature. The classification results showed that the DT algorithm outperformed other models with a classification accuracy of 99%. Although WQI is a popular method used to assess the water quality, there is a need to address the uncertainties and biases introduced by the limitations of data acquisition (such as specific location/area, type and number of parameters or water type) leading to class imbalance. This can be achieved by developing a more refined index that considers various other factors such as topographical and hydrological parameters with spatial temporal variations combined machine learning techniques to effectively contribute in estimation of water quality for all regions. HIGHLIGHTS Evaluated five WQI based on six physico-chemical parameters to analyze their sensitivity toward selected location, type and frequency for data sampling.; Computed WQIs categorized the dataset as ‘Very Poor’ because of the uneven distribution of water samples leading to class imbalance.; Five ML models used in which Decision Tree classification accuracy is 99%.; For refined index topographical and hydrological parameters should be considered.;
Keywords