Atmospheric Chemistry and Physics (May 2023)

Technical note: Improving the European air quality forecast of the Copernicus Atmosphere Monitoring Service using machine learning techniques

  • J.-M. Bertrand,
  • F. Meleux,
  • A. Ung,
  • G. Descombes,
  • A. Colette

DOI
https://doi.org/10.5194/acp-23-5317-2023
Journal volume & issue
Vol. 23
pp. 5317 – 5333

Abstract

Read online

Model output statistics (MOS) approaches relying on machine learning algorithms were applied to downscale regional air quality forecasts produced by CAMS (Copernicus Atmosphere Monitoring Service) at hundreds of monitoring sites across Europe. Besides the CAMS forecast, the predictors in the MOS typically include meteorological variables but also ancillary data. We explored first a “local” approach where specific models are trained at each site. An alternative “global” approach where a single model is trained with data from the whole geographical domain was also investigated. In both cases, local predictors are used for a given station in predictive mode. Because of its global nature, the latter approach can capture a variety of meteorological situations within a very short training period and is thereby more suited to cope with operational constraints in relation to the training of the MOS (frequent upgrades of the modelling system, addition of new monitoring sites). Both approaches have been implemented using a variety of machine learning algorithms: random forest, gradient boosting, and standard and regularized multi-linear models. The quality of the MOS predictions is evaluated in this work for four key pollutants, namely particulate matter (PM10 and PM2.5), ozone (O3) and nitrogen dioxide (NO2), according to scores based on the predictive errors and on the detection of pollution peaks (exceedances of the regulatory thresholds). Both the local and the global approaches significantly improve the performances of the raw ensemble forecast. The most important result of this study is that the global approach competes with and can even outperform the local approach in some cases. This global approach gives the best RMSE scores when relying on a random forest model for the prediction of daily mean, daily max and hourly concentrations. By contrast, it is the gradient boosting model which is better suited for the detection of exceedances of the European Union regulated threshold values for O3 and PM10.