Environment International (May 2023)
A hybrid model for estimating the number concentration of ultrafine particles based on machine learning algorithms in central Taiwan
Abstract
Modeling is a cost-effective measure to estimate ultrafine particle (UFP) levels. Previous UFP estimates generally relied on land-use regression with insufficient temporal resolution. We carried out in-situ measurements for UFP in central Taiwan and developed a model incorporating satellite-based measurements, meteorological variables, and land-use data to estimate daily UFP levels at a 1-km resolution. Two sampling campaigns were conducted for measuring hourly UFP concentrations at six sites between 2008–2010 and 2017–2021, respectively, using scanning mobility particle sizers. Three machine learning algorithms, namely random forest, eXtreme gradient boosting (XGBoost), and deep neural network, were used to develop UFP estimation models. The performances were evaluated with a 10-fold cross-validation, temporal, and spatial validation. A total of 1,022 effective sampling days were conducted. The XGBoost model had the best performance with a training coefficient of determination (R2) of 0.99 [normalized root mean square error (nRMSE): 6.52%] and a cross-validation R2 of 0.78 (nRMSE: 31.0%). The ten most important variables were surface pressure, distance to the nearest road, temperature, calendar year, day of the year, NO2, meridional wind, the total length of roads, PM2.5, and zonal wind. The UFP levels were elevated along the main roads across different seasons, suggesting that traffic emission is an important contributor to UFP. This hybrid model outperformed prior land use regression models and thus can provide more accurate estimates of UFP for epidemiological studies.