مجله آب و خاک (Feb 2017)

Modeling Water Quality Parameters Using Data-driven Methods

  • Shima Soleimani,
  • Omid Bozorg Haddad,
  • Mojtaba Moravej

DOI
https://doi.org/10.22067/jsw.v30i3.43806
Journal volume & issue
Vol. 30, no. 3
pp. 743 – 757

Abstract

Read online

Introduction: Surface water bodies are the most easily available water resources. Increase use and waste water withdrawal of surface water causes drastic changes in surface water quality. Water quality, importance as the most vulnerable and important water supply resources is absolutely clear. Unfortunately, in the recent years because of city population increase, economical improvement, and industrial product increase, entry of pollutants to water bodies has been increased. According to that water quality parameters express physical, chemical, and biological water features. So the importance of water quality monitoring is necessary more than before. Each of various uses of water, such as agriculture, drinking, industry, and aquaculture needs the water with a special quality. In the other hand, the exact estimation of concentration of water quality parameter is significant. Material and Methods: In this research, first two input variable models as selection methods (namely, correlation coefficient and principal component analysis) were applied to select the model inputs. Data processing is consisting of three steps, (1) data considering, (2) identification of input data which have efficient on output data, and (3) selecting the training and testing data. Genetic Algorithm-Least Square Support Vector Regression (GA-LSSVR) algorithm were developed to model the water quality parameters. In the LSSVR method is assumed that the relationship between input and output variables is nonlinear, but by using a nonlinear mapping relation can create a space which is named feature space in which relationship between input and output variables is defined linear. The developed algorithm is able to gain maximize the accuracy of the LSSVR method with auto LSSVR parameters. Genetic algorithm (GA) is one of evolutionary algorithm which automatically can find the optimum coefficient of Least Square Support Vector Regression (LSSVR). The GA-LSSVR algorithm was employed to model water quality parameters such as Na+, K+, Mg2+, So42-, Cl-, pH, Electric conductivity (EC) and total dissolved solids (TDS) in the Sefidrood River. For comparison the selected input variable methods coefficient of determination (R2), root mean square error (RMSE), and Nash-Sutcliff (NS) are applied. Results and Discussion: According to Table 5, the results of the GA-LSSVR algorithm by using correlation coefficient and PCA methods approximately show similar results. About pH, EC, and TDS quality parameters, the results of PCA method have, the more accuracy, but the difference of RMSE between the PCA method and correlation coefficient method is not significant. The PCA method cause improvement in NS values to 22 and 0.1 percentages in pH and TDS water quality parameters to the correlation coefficient method, respectively,and NS criteria value for EC water quality parameter did not change in both methods. As a result, according to positive values of NS criteria in both PCA and correlation methods, it is clear that GA-LSSVR has a high ability for modeling of water quality parameters. Because of summation of NS criteria for PCA method is 5.53 and for correlation coefficient is 5.62, we can say that the correlation coefficient method has more applicable as a data processing method, but both methods have a high ability. Orouji et all. (18) used assumed models to model Na+, K+, Mg2+, So42- , Cl- , pH, EC, and TDS by Genetic programming (GP) method. The RMSE criteria of the better models for testing data are 2.1, 0.02, 0.85, 0.93, 2.18, 0.33, 404.15, and 246.15, respectively. For comparison the orouji et al. (18) and table (5), the Results show using the correlation coefficient method as a data processing method can improve the results to 5.5 times. The results indicate the superiority of developingalgorithm increases the modeling accuracy. It is worth mentioning that according to NS criteria both selected inputs variable methods (correlation coefficient and PCA) are capable to model the water quality parameters. Also the result shows that using correlation coefficient method lead to more accurate results than PCA. Conclusion: In this study, GA algorithm as one of the most applicable optimization algorithms in the different sciences was used to optimize the LSSVR coefficients and Then GA-LSSVR was developed to model the water quality parameters. To comparison data processing methods (correlation coefficient and PCA methods), the input variables of both methods were determined and GA-LSSVR was performed for each of the input variables. To compare the results of the PCA and correlation coefficient methods, some statistics were used. It is worth mentioning that according to NS criteria both input selection methods are capable to model water quality parameters. Also the results show that using correlation coefficient method lead to more accurate results than PCA.

Keywords