Environment International (Aug 2020)

Construction of a virtual PM2.5 observation network in China based on high-density surface meteorological observations using the Extreme Gradient Boosting model

  • Ke Gui,
  • Huizheng Che,
  • Zhaoliang Zeng,
  • Yaqiang Wang,
  • Shixian Zhai,
  • Zemin Wang,
  • Ming Luo,
  • Lei Zhang,
  • Tingting Liao,
  • Hujia Zhao,
  • Lei Li,
  • Yu Zheng,
  • Xiaoye Zhang

Journal volume & issue
Vol. 141
p. 105801

Abstract

Read online

With increasing public concerns on air pollution in China, there is a demand for long-term continuous PM2.5 datasets. However, it was not until the end of 2012 that China established a national PM2.5 observation network. Before that, satellite-retrieved aerosol optical depth (AOD) was frequently used as a primary predictor to estimate surface PM2.5. Nevertheless, satellite-retrieved AOD often encounter incomplete daily coverage due to its sampling frequency and interferences from cloud, which greatly affect the representation of these AOD-based PM2.5. Here, we constructed a virtual ground-based PM2.5 observation network at 1180 meteorological sites across China using the Extreme Gradient Boosting (XGBoost) model with high-density meteorological observations as major predictors. Cross-validation of the XGBoost model showed strong robustness and high accuracy in its estimation of the daily (monthly) PM2.5 across China in 2018, with R2, root-mean-square error (RMSE) and mean absolute error values of 0.79 (0.92), 15.75 μg/m3 (6.75 μg/m3) and 9.89 μg/m3 (4.53 μg/m3), respectively. Meanwhile, we find that surface visibility plays the dominant role in terms of the relative importance of variables in the XGBoost model, accounting for 39.3% of the overall importance.We then use meteorological and PM2.5 data in the year 2017 to assess the predictive capability of the model. Results showed that the XGBoost model is capable to accurately hindcast historical PM2.5 at monthly (R2 = 0.80, RMSE = 14.75 μg/m3), seasonal (R2 = 0.86, RMSE = 12.28 μg/m3), and annual (R2 = 0.81, RMSE = 10.10 μg/m3) mean levels. In general, the newly constructed virtual PM2.5 observation network based on high-density surface meteorological observations using the Extreme Gradient Boosting model shows great potential in reconstructing historical PM2.5 at ~1000 meteorological sites across China. It will be of benefit to filling gaps in AOD-based PM2.5 data, as well as to other environmental studies including epidemiology.

Keywords