Results in Engineering (Mar 2024)
Ozone concentration forecasting utilizing leveraging of regression machine learnings: A case study at Klang Valley, Malaysia
Abstract
At Klang Valley, ground-level ozone is a significant source of air pollution. Ozone (O3) concentration is affected by meteorological conditions and air pollutants. Linear Regression Models (LRM), Regression Trees (RT), Support Vector Machines (SVM), Ensembles of Trees (ET), Gaussian Process Regression (GPR), and Neural Networks (NN) are utilized in a thorough analysis to determine the accuracy of various machine learning in forecasting the ground level O3 concentration. The primary associated contributions from this research are comparisons of regression statistical model performance based on indicators of root mean squared error (RMSE), coefficient of determination (R2), mean squared error (MSE), mean absolute error (MAE), prediction speed, and training time of regression models. Overall, exponential GPR outperformed other regression models in scenario 1 (S-1), scenario 2 (S-2), scenario (S-3), and scenario 4 (S-4) by incorporating multiple number of lags into respective scenarios and new method of testing ''re-substitution'' performed more reliable and consistent than applying identical datasets to 20 % of model testing. The findings showed that GPR performed accurate results with R2 = 0.98, 0.95, 0.96, and 0.96 for S-1, S-2, S-3 and S-4 respectively.