Water (Jul 2023)

A Machine Learning Approach for the Estimation of Total Dissolved Solids Concentration in Lake Mead Using Electrical Conductivity and Temperature

  • Godson Ebenezer Adjovu,
  • Haroon Stephen,
  • Sajjad Ahmad

DOI
https://doi.org/10.3390/w15132439
Journal volume & issue
Vol. 15, no. 13
p. 2439

Abstract

Read online

Total dissolved solids (TDS) concentration determination in water bodies is sophisticated, time-consuming, and involves expensive field sampling and laboratory processes. TDS concentration has, however, been linked to electrical conductivity (EC) and temperature. Compared to monitoring TDS concentrations, monitoring EC and temperature is simpler, inexpensive, and takes less time. This study, therefore, applied several machine learning (ML) approaches to estimate TDS concentration in Lake Mead using EC and temperature data. Standalone models including the support vector machine (SVM), linear regressors (LR), K-nearest neighbor model (KNN), the artificial neural network (ANN), and ensemble models such as bagging, gradient boosting machine (GBM), extreme gradient boosting (XGBoost), random forest (RF), and extra trees (ET) models were used in this study. The models’ performance were evaluated using several performance metrics aimed at providing a holistic assessment of each model. Metrics used include the coefficient of determination (R2), mean absolute error (MAE), percent mean absolute relative error (PMARE), root mean square error (RMSE), the scatter index (SI), Nash–Sutcliffe model efficiency (NSE) coefficient, and percent bias (PBIAS). Results obtained showed varying model performance at the training, testing, and external validation stage of the models, with obtained R2 of 0.77–1.00, RMSE of 2.28–37.68 mg/L, an MAE of 0.14–22.67 mg/L, a PMARE of 0.02–3.42%, SI of 0.00–0.06, NSE of 0.77–1.00, and a PBIAS of 0.30–0.97 across all models for the three datasets. We utilized performance rankings to assess the model performance and found the LR to be the best-performing model on the external validation datasets among all the models (R2 of 0.82 and RMSE of 33.09 mg/L), possibly due to the established existence of a relationship between TDS and EC, although this may not always be linear. Similarly, we found the XGBoost to be the best-performing ensemble model based on the external validation with R2 of 0.81 and RMSE of 34.19 mg/L. Assessing the overall performance of the models across all the datasets, however, revealed GBM to produce a superior performance based on the ranks, possibly due to its ability to reduce overfitting and improve generalizations. The findings from this study could be employed in assisting water resources managers and stakeholders in effective monitoring and management of water resources to ensure their sustainability.

Keywords