대한환경공학회지 (Apr 2023)

Comparison of Automated Machine Learning Model Performance for Predicting Chlorophyll-a Concentration according to Measurement Frequency of Input Data

  • Jungsu Park

DOI
https://doi.org/10.4491/KSEE.2023.45.4.201
Journal volume & issue
Vol. 45, no. 4
pp. 201 – 209

Abstract

Read online

Objectives Automated machine learning is a recent field of study that automates the process of machine learning model development including proper model selection and optimization. In this study, auto H2O, a novel automated machine learning algorithm, was used to develop a model to predict chlorophyll-a (chl-a). Methods This study used datasets with different observation frequencies of 1h, 2h, 8h, 24h and 1 week for the development of a machine learning model using an auto H2O algorithm to analyze the effects of measurement frequency of input data on model performance. The effect of the concentration of the input datasets on the performance of the model was also compared by building a model using datasets with observed values of chl-a exceeding 30 mg/m3. The model performance was evaluated using three indices mean absolute error (MAE), Nash-Sutcliffe coefficient of efficiency (NSE) and root mean squared error-observation standard deviation ratio (RSR). Results and Discussion The MAE, NSE, and RSR of the model using the input data with a measurement frequency of 1h were analyzed as 0.8977, 0.9710, and 0.1704, respectively. The higher the measurement frequency of the input data, the better the performance of the model as the NSE of the model using full data was 0.9710, 0.9552, 0.8856, 0.8396, and 0.7509 for the input datasets with 1h, 2h, 8h, 24h and 1 week observation frequencies, respectively. The difference in model performance according to the difference in measurement frequency was larger for the model using data with the measured value of chl-a exceeding 30 mg/m3, as the NSE was analyzed to be 0.8971, 0.8164, 0.5704, 0.5141, and 0.2052, respectively. Conclusion The auto H2O model for predicting chl-a showed better model performance as the measurement frequency of the input data increased, and the difference in performance according to the measurement frequency was larger in the range of observed chl-a concentrations that exceeded 30 mg/m3.

Keywords