Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River

Elham Fazel Najafabadi; Paria Shojaei; Mojgan Askarizadeh

doi:10.1016/j.rineng.2025.106665

Results in Engineering (Sep 2025)

Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River

Elham Fazel Najafabadi,
Paria Shojaei,
Mojgan Askarizadeh

Affiliations

Elham Fazel Najafabadi: Department of Water Science and Engineering. College of Agriculture, Isfahan University of Technology, Isfahan, Iran; Corresponding author.
Paria Shojaei: Department of Architecture and Civil Engineering, University of Bath, Bath, UK
Mojgan Askarizadeh: Department of Computer Engineering, Faculty of Engineering, Ardakan University, Ardakan, Yazd, Iran

DOI: https://doi.org/10.1016/j.rineng.2025.106665
Journal volume & issue: Vol. 27
p. 106665

Abstract

Read online

Given the key role of rivers in supplying drinking water, supporting industry, agriculture, and ecosystems, water quality assessment and pollution quantification are essential for sustainable use. This study evaluated five machine learning models, i.e., Lasso Regression, Random Forest (RF), Gradient Boosting (GB), XGBoost, and Support Vector Machine (SVM) for predicting four water quality parameters—EC (Electrical Conductivity), TDS (Total Dissolved Solids), Sodium Adsorption Ratio (SAR), and TH (Total Hardness)—using data collected over a 31-year period from eight monitoring stations along the Zayandeh Rood River, a vital water source for drinking, agriculture, and industry in the arid region of central Iran. The models were evaluated based on five statistical criteria: R², RMSE, RRMSE, r, and MAE. Two dimensionality reduction techniques—PCA and correlation matrix-based feature reduction—were implemented to enhance model efficiency and mitigate multicollinearity. The findings indicate that the best-performing model for a given parameter varied across stations. However, the differences in evaluation metrics between the best models were quite low in most stations. The GB and SVM models outperformed other models in predicting EC, and TDS (0.80<R²<0.99). However, in predicting SAR, the GB and XGBoost models (0.955<R2<0.999), and in predicting TH, the Lasso and SVM models achieved higher accuracy (0.830<R²<0.996). The Lasso regression model proved to be the most effective for predicting TH at half of the monitoring stations.

Published in Results in Engineering

ISSN: 2590-1230 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology
Website: https://www.journals.elsevier.com/results-in-engineering

About the journal

Abstract

Keywords