Environmental Research Letters (Jan 2022)

Reconstructing global PM2.5 monitoring dataset from OpenAQ using a two-step spatio-temporal model based on SES-IDW and LSTM

  • Siyu Tan,
  • Yuan Wang,
  • Qiangqiang Yuan,
  • Li Zheng,
  • Tongwen Li,
  • Huanfeng Shen,
  • LiangPei Zhang

DOI
https://doi.org/10.1088/1748-9326/ac52c9
Journal volume & issue
Vol. 17, no. 3
p. 034014

Abstract

Read online

Fine particulate matter (PM _2.5 ) is widely concerned for its harmful impacts on global environment and human health, making air pollution monitoring so crucial and indispensable. As the world’s first open, real-time, and historical air quality platform, OpenAQ collects and provides government measurement and research-level data from various channels. However, despite OpenAQ’s innovation in providing us with ground-measured PM _2.5 worldwide, we find significant data gaps in time series for most of the sites. The incompleteness of the data directly affects the public perception of PM _2.5 concentration levels and hinders the progress of research related to air pollution. To address these issues, a two-step hybrid model named ST-SILM, i.e. spatio-temporal model with single exponential smoothing-inverse distance weighted (SES-IDW) and long short-term memory (LSTM), is proposed to repair the missing data from PM _2.5 sites worldwide collected from OpenAQ from 2017 to 2019. Both spatio-temporal correlation and neighborhood fields are considered and established in the model. To be specific, SES-IDW were firstly used to repair missing values, and secondly, the LSTM network was employed to reconstruct the time series of continuous missing data. After the global ground-measured PM _2.5 was reconstructed, the light gradient boosting machine model was applied to remote sensing estimation of the original ground-measured PM _2.5 and of the reconstructed ground-measured PM _2.5 to further verify the performance of ST-SILM. Experiment results show that the estimation accuracy of the reconstructed dataset is better ( R ^2 from 2017 to 2019 increased by 0.02, 0.02, and 0.01 compared with the original dataset). Therefore, it is concluded that the proposed model can effectively reconstruct data from PM _2.5 sites worldwide.

Keywords