Atmospheric Chemistry and Physics (May 2021)

Himawari-8-derived diurnal variations in ground-level PM<sub>2.5</sub> pollution across China using the fast space-time Light Gradient Boosting Machine (LightGBM)

  • J. Wei,
  • J. Wei,
  • J. Wei,
  • Z. Li,
  • R. T. Pinker,
  • J. Wang,
  • L. Sun,
  • W. Xue,
  • R. Li,
  • M. Cribb

DOI
https://doi.org/10.5194/acp-21-7863-2021
Journal volume & issue
Vol. 21
pp. 7863 – 7880

Abstract

Read online

Fine particulate matter with a diameter of less than 2.5 µm (PM2.5) has been used as an important atmospheric environmental parameter mainly because of its impact on human health. PM2.5 is affected by both natural and anthropogenic factors that usually have strong diurnal variations. Such information helps toward understanding the causes of air pollution, as well as our adaptation to it. Most existing PM2.5 products have been derived from polar-orbiting satellites. This study exploits the use of the next-generation geostationary meteorological satellite Himawari-8/AHI (Advanced Himawari Imager) to document the diurnal variation in PM2.5. Given the huge volume of satellite data, based on the idea of gradient boosting, a highly efficient tree-based Light Gradient Boosting Machine (LightGBM) method by involving the spatiotemporal characteristics of air pollution, namely the space-time LightGBM (STLG) model, is developed. An hourly PM2.5 dataset for China (i.e., ChinaHighPM2.5) at a 5 km spatial resolution is derived based on Himawari-8/AHI aerosol products with additional environmental variables. Hourly PM2.5 estimates (number of data samples = 1 415 188) are well correlated with ground measurements in China (cross-validation coefficient of determination, CV-R2 = 0.85), with a root-mean-square error (RMSE) and mean absolute error (MAE) of 13.62 and 8.49 µg m−3, respectively. Our model captures well the PM2.5 diurnal variations showing that pollution increases gradually in the morning, reaching a peak at about 10:00 LT (GMT+8), then decreases steadily until sunset. The proposed approach outperforms most traditional statistical regression and tree-based machine-learning models with a much lower computational burden in terms of speed and memory, making it most suitable for routine pollution monitoring.