Environment International (Mar 2019)

Estimation of daily PM10 and PM2.5 concentrations in Italy, 2013–2015, using a spatiotemporal land-use random-forest model

  • Massimo Stafoggia,
  • Tom Bellander,
  • Simone Bucci,
  • Marina Davoli,
  • Kees de Hoogh,
  • Francesca de' Donato,
  • Claudio Gariazzo,
  • Alexei Lyapustin,
  • Paola Michelozzi,
  • Matteo Renzi,
  • Matteo Scortichini,
  • Alexandra Shtein,
  • Giovanni Viegi,
  • Itai Kloog,
  • Joel Schwartz

Journal volume & issue
Vol. 124
pp. 170 – 179

Abstract

Read online

Particulate matter (PM) air pollution is one of the major causes of death worldwide, with demonstrated adverse effects from both short-term and long-term exposure. Most of the epidemiological studies have been conducted in cities because of the lack of reliable spatiotemporal estimates of particles exposure in nonurban settings. The objective of this study is to estimate daily PM10 (PM < 10 μm), fine (PM < 2.5 μm, PM2.5) and coarse particles (PM between 2.5 and 10 μm, PM2.5–10) at 1-km2 grid for 2013–2015 using a machine learning approach, the Random Forest (RF). Separate RF models were defined to: predict PM2.5 and PM2.5–10 concentrations in monitors where only PM10 data were available (stage 1); impute missing satellite Aerosol Optical Depth (AOD) data using estimates from atmospheric ensemble models (stage 2); establish a relationship between measured PM and satellite, land use and meteorological parameters (stage 3); predict stage 3 model over each 1-km2 grid cell of Italy (stage 4); and improve stage 3 predictions by using small-scale predictors computed at the monitor locations or within a small buffer (stage 5). Our models were able to capture most of PM variability, with mean cross-validation (CV) R2 of 0.75 and 0.80 (stage 3) and 0.84 and 0.86 (stage 5) for PM10 and PM2.5, respectively. Model fitting was less optimal for PM2.5–10, in summer months and in southern Italy. Finally, predictions were equally good in capturing annual and daily PM variability, therefore they can be used as reliable exposure estimates for investigating long-term and short-term health effects. Keywords: Aerosol optical depth, Exposure assessment, Machine learning, Particulate matter, Random forest, Satellite