Scientific Reports (Oct 2024)

Development of soft computing-based models for forecasting water quality index of Lorestan Province, Iran

  • Balraj Singh,
  • Alireza Sepahvand,
  • Parveen Sihag,
  • Karan Singh,
  • Chander Prabha,
  • Anindya Nag,
  • Md. Mehedi Hassan,
  • S. Vimal,
  • Dongwann Kang

DOI
https://doi.org/10.1038/s41598-024-76894-w
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 17

Abstract

Read online

Abstract The Water Quality Index (WQI) is widely used as a classification indicator and essential parameter for water resources management projects. WQI combines several physical and chemical parameters into a single metric to measure the status of Water Quality. This study explores the application of five soft computing techniques, including Gene Expression Programming, Gaussian Process, Reduced Error Pruning Tree (REPt), Artificial Neural Network with FireFly (ANN-FFA), and combinations of Reduced Error Pruning Tree with bagging. These models aim to predict the WQI of Khorramabad, Biranshahr, and Alashtar sub-watersheds in Lorestan province, Iran. The dataset consists of 124 observations, with input variables being sulfate (SO4), total dissolved solids (TDS), the potential of Hydrogen (pH), chloride (Cl), electrical conductivity (EC), Potassium (K), bicarbonate (HCO), magnesium (Mg), sodium (Na), and calcium (Ca), and WQI as the output variable. For model creation (train subset) and model validation (test subset), the data were split into two subsets (train and test) in a ratio of 70:30. The performance evaluation parameters values of training and testing stages of various models indicate that the ANN-FFA based data-driven model performs better than the other modeling techniques applied with the values of coefficient of correlation 0.9990 & 0.9989; coefficient of determination 0.9612 & 0.9980; root mean square error 0.3036 & 0.3340; Nash–Sutcliffe error 0.9980 & 0.9979; and Mean average percentage error 0.7259% & 0.7969% for the train and test subsets, respectively. Taylor diagram results also suggest that ANN-FFA is the best-performing model, followed by the GEP model. This study introduces a novel model for predicting WQI using advanced soft computing models that have not been previously applied in this study area, highlighting its novelty and relevance. The proposed model significantly enhances predictive accuracy and efficiency, offering real-time, cost-effective WQI predictions that outperform traditional methods in handling complex, nonlinear environmental data.

Keywords