Water (Apr 2022)

Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning

  • Alberto Fernández del Castillo,
  • Carlos Yebra-Montes,
  • Marycarmen Verduzco Garibay,
  • José de Anda,
  • Alejandro Garcia-Gonzalez,
  • Misael Sebastián Gradilla-Hernández

DOI
https://doi.org/10.3390/w14081235
Journal volume & issue
Vol. 14, no. 8
p. 1235

Abstract

Read online

Water quality indices (WQIs) are used for the simple assessment and classification of the water quality of surface water sources. However, considerable time, financial resources, and effort are required to measure the parameters used for their calculation. Prediction of WQIs through supervised machine learning is a useful and simple approach to reduce the cost of the analysis through the development of predictive models with a reduced number of water quality parameters. In this study, regression and classification machine-learning models were developed to estimate the ecosystem-specific WQI previously developed for the Santiago-Guadalajara River (SGR-WQI), which involves the measurement of 17 water quality parameters. The best subset selection method was employed to reduce the number of significant parameters required for the SGR-WQI prediction. The multiple linear regression model using 12 parameters displayed a residual square error (RSE) of 3.262, similar to that of the multiple linear regression model using 17 parameters (RSE = 3.255), which translates into significant savings for WQI estimation. Additionally, the generalized additive model not only displayed an adjusted R2 of 0.9992, which is the best fit of all the models evaluated, but also fitted the rating curves of each parameter developed for the original algorithm for the SGR-WQI calculation with great accuracy. Regarding the classification models, an overall proportion of 93% and 86% of data were correctly classified using the logistic regression model with 17 and 12 parameters, respectively, while the linear discriminant functions using 12 parameters correctly classified an overall proportion of 84%. The models evaluated were found to be efficient in predicting the SGR-WQI with a reduced number of parameters as complementary tools to extend the current water quality monitoring program of the Santiago-Guadalajara River.

Keywords