Water Quality Research Journal (May 2023)
Prediction of biochemical oxygen demand with genetic algorithm-based support vector regression
Abstract
Five-day biochemical oxygen demand (BOD5) is a vital wastewater contamination strength indicator. The process of measuring BOD5 is to measure the mass of molecular oxygen consumed in 1 L of water at 20 °C over a 5-day incubation period. It is a time-consuming process and often too late for water management agencies to make a timely reaction if the result of measurement shows a water body is seriously polluted. Biosensors can simplify the process of BOD5 measurement; however, the measurement results often deviate significantly from the measured BOD5 values. The main aim of this research is to identify a machine learning model, which could predict BOD5 value from historical data and make it easier to detect water pollution in advance and timely adopt treatment measures. Three machine learning techniques, linear regression, support vector regression (SVR) and multi-layer perceptron (MLP) and two optimization processes have been studied in this research. Four main steps, preprocessing (one-time only), model training, model evaluation (testing) and analysis have been implemented in the experiments. With three feature selection strategies, the results of the experiment showed that SVR with genetic algorithm (GA) optimizer achieved the best performance with R2 of 0.694 and the lowest MAE of 0.109. HIGHLIGHTS Genetic algorithm-based support vector regression has been proposed to predict the biochemical oxygen demand values from simple variables that are easily measured.; Comparison experiments have been conducted among three popular machine learning techniques with two optimization processes.; The best model SVR with genetic algorithm optimizer achieved the best performance with R2 of 0.694 and the lowest MAE of 0.109.;
Keywords