Environmental Research Communications (Jan 2024)
A data-driven approach for PM2.5 estimation in a metropolis: random forest modeling based on ERA5 reanalysis data
Abstract
Air pollution in urban environments, particularly from fine particulate matter (PM _2.5 ), poses significant health risks. Addressing this issue, the current study developed a Random Forest (RF) model to estimate hourly PM _2.5 concentrations in Ankara, Türkiye. Utilizing ERA5 reanalysis data, the model incorporated various meteorological and environmental variables. Over the period 2020–2021, the model’s performance was validated against data from eleven air quality monitoring stations, demonstrating a robust coefficient of determination (R ^2 ) of 0.73, signifying its strong predictive capability. Low root mean squared error (RMSE) and mean absolute error (MAE) values further affirmed the model’s precision. Seasonal and temporal analysis revealed the model’s adaptability, with autumn showing the highest accuracy (R ^2 = 0.82) and summer the least (R ^2 = 0.51), suggesting seasonal variability in predictive performance. Hourly evaluations indicated the model’s highest accuracy at 23:00 (R ^2 = 0.93), reflecting a solid alignment with observed data during nocturnal hours. On a monthly scale, November’s predictions were the most precise (R ^2 = 0.82), while May presented challenges in accuracy (R ^2 = 0.49). These seasonal and monthly fluctuations underscore the complex interplay of atmospheric dynamics affecting PM _2.5 dispersion. By integrating key determinants such as ambient air temperature, surface pressure, total column water vapor, boundary layer height, forecast albedo, and leaf area index, this study enhances the understanding of air pollution patterns in urban settings. The RF model’s comprehensive evaluation across time scales offers valuable insights for policymakers and environmental health practitioners, supporting evidence-based strategies for air quality management.
Keywords