Atmosphere (Feb 2025)

Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach

  • Sophia Park,
  • Myeong Jun Kim

DOI
https://doi.org/10.3390/atmos16030239
Journal volume & issue
Vol. 16, no. 3
p. 239

Abstract

Read online

This study applied various machine learning techniques, including shrinkage methods, XGBoost, CSR, and random forest, to forecast ultrafine particulate matter (PM2.5) concentrations in Seoul, South Korea. The analysis incorporated key variables known to significantly influence PM2.5 levels, including meteorological data, coal-fired power generation, and PM2.5 concentrations in Dalian, China. Using daily data from 1 January 2018 to 30 June 2023, this study employed the Boruta algorithm, a variable selection technique based on the random forest model, to identify the most influential predictors for predicting PM2.5 concentrations. Out-of-sample multi-period forecasts were evaluated for each model using the RMSE, MAE, and Giacomini–White test to determine the most effective forecasting approach. It was found that the random forest model with the Boruta algorithm outperformed all other models, achieving improvements of 4% to 17% in the RMSE and 4% to 16.5% in the MAE across all forecast horizons. The results indicate that the random forest model and its variant incorporating the Boruta algorithm provided superior short-term forecasting performance. In particular, the Boruta algorithm highlighted the lagged variables of temperature, PM2.5 concentration, mean humidity, and Dalian PM2.5 concentration as critical factors for the accurate prediction of PM2.5 levels in Seoul. These findings underscore the utility of data-driven approaches to improve air quality forecasting and management.

Keywords