Journal of Big Data (May 2023)
A comparison of machine learning methods for ozone pollution prediction
Abstract
Abstract Precise and efficient ozone ( $$\hbox {O}_{3}$$ O 3 ) concentration prediction is crucial for weather monitoring and environmental policymaking due to the harmful effects of high $$\hbox {O}_{3}$$ O 3 pollution levels on human health and ecosystems. However, the complexity of $$\hbox {O}_{3}$$ O 3 formation mechanisms in the troposphere presents a significant challenge in modeling $$\hbox {O}_{3}$$ O 3 accurately and quickly, especially in the absence of a process model. Data-driven machine-learning techniques have demonstrated promising performance in modeling air pollution, mainly when a process model is unavailable. This study evaluates the predictive performance of nineteen machine learning models for ozone pollution prediction. Specifically, we assess how incorporating features using Random Forest affects $$\hbox {O}_{3}$$ O 3 concentration prediction and investigate using time-lagged measurements to improve prediction accuracy. Air pollution and meteorological data collected at King Abdullah University of Science and Technology are used. Results show that dynamic models using time-lagged data outperform static and reduced machine learning models. Incorporating time-lagged data improves the accuracy of machine learning models by 300% and 200%, respectively, compared to static and reduced models, under RMSE metrics. And importantly, the best dynamic model with time-lagged information only requires 0.01 s, indicating its practical use. The Diebold-Mariano Test, a statistical test used to compare the forecasting accuracy of models, is also conducted.
Keywords