Water Practice and Technology (Feb 2024)

Machine learning, Water Quality Index, and GIS-based analysis of groundwater quality

  • Ghulam Shabir Solangi,
  • Zouhaib Ali,
  • Muhammad Bilal,
  • Muhammad Junaid,
  • Sallahuddin Panhwar,
  • Hareef Ahmed Keerio,
  • Iftikhar Hussain Sohu,
  • Sheeraz Gul Shahani,
  • Noor Zaman

DOI
https://doi.org/10.2166/wpt.2024.014
Journal volume & issue
Vol. 19, no. 2
pp. 384 – 400

Abstract

Read online

Water is essential for life, as it supports bodily functions, nourishes crops, and maintains ecosystems. Drinking water is crucial for maintaining good health and can also contribute to economic development by reducing healthcare costs and improving productivity. In this study, we employed five different machine learning algorithms – logistic regression (LR), decision tree classifier (DTC), extreme gradient boosting (XGB), random forest (RF), and K-nearest neighbors (KNN) – to analyze the dataset, and their prediction performance were evaluated using four metrics: accuracy, precision, recall, and F1 score. Physiochemical parameters of 30 groundwater samples were analyzed to determine the Water Quality Index (WQI) of Pano Aqil city, Pakistan. The samples were categorized into the following four classes based on their WQI values: excellent water, good water, poor water, and unfit for drinking. The WQI scores showed that only 43.33% of the samples were deemed acceptable for drinking, indicating that the majority (56.67%) were unsuitable. The findings suggest that the DTC and XGB algorithms outperform all other algorithms, achieving overall accuracies of 100% each. In contrast, RF, KNN, and LR exhibit overall accuracies of 88, 75, and 50%, respectively. Researchers seeking to enhance water quality using machine learning can benefit from the models described in this study for water quality prediction. HIGHLIGHTS Groundwater quality is evaluated using the Water Quality Index method.; Machine learning algorithms are used for forecasting groundwater quality.; The predictive capabilities of decision tree classifier, extreme gradient boosting, logistic regression, random forest, and K-nearest neighbors models have been evaluated and compared.;

Keywords