Earth System Science Data (May 2024)

LGHAP v2: a global gap-free aerosol optical depth and PM<sub>2.5</sub> concentration dataset since 2000 derived via big Earth data analytics

  • K. Bai,
  • K. Bai,
  • K. Li,
  • L. Shao,
  • X. Li,
  • C. Liu,
  • Z. Li,
  • M. Ma,
  • D. Han,
  • Y. Sun,
  • Z. Zheng,
  • R. Li,
  • N.-B. Chang,
  • J. Guo

Journal volume & issue
Vol. 16
pp. 2425 – 2448


Read online

The Long-term Gap-free High-resolution Air Pollutants (LGHAP) concentration dataset generated in our previous study has provided spatially contiguous daily aerosol optical depth (AOD) and fine particulate matter (PM2.5) concentrations at a 1 km grid resolution in China since 2000. This advancement empowered unprecedented assessments of regional aerosol variations and their influence on the environment, health, and climate over the past 20 years. However, there is a need to enhance such a high-quality AOD and PM2.5 concentration dataset with new robust features and extended spatial coverage. In this study, we present version 2 of a global-scale LGHAP dataset (LGHAP v2), which was generated using improved big Earth data analytics via a seamless integration of versatile data science, pattern recognition, and machine learning methods. Specifically, multimodal AODs and air quality measurements acquired from relevant satellites, ground monitoring stations, and numerical models were harmonized by harnessing the capability of random-forest-based data-driven models. Subsequently, an improved tensor-flow-based AOD reconstruction algorithm was developed to weave the harmonized multisource AOD products together for filling data gaps in Multi-Angle Implementation of Atmospheric Correction (MAIAC) AOD retrievals from Terra. The results of the ablation experiments demonstrated better performance of the improved tensor-flow-based gap-filling method in terms of both convergence speed and data accuracy. Ground-based validation results indicated good data accuracy of this global gap-free AOD dataset, with a correlation coefficient (R) of 0.85 and a root mean square error (RMSE) of 0.14 compared to the worldwide AOD observations from the AErosol RObotic NETwork (AERONET), outperforming the purely reconstructed AODs (R = 0.83, RMSE = 0.15), but they were slightly worse than raw MAIAC AOD retrievals (R = 0.88, RMSE = 0.11). For PM2.5 concentration mapping, a novel deep-learning approach, termed the SCene-Aware ensemble learning Graph ATtention network (SCAGAT), was hereby applied. While accounting for the scene representativeness of data-driven models across regions, the SCAGAT algorithm performed better during spatial extrapolation, largely reducing modeling biases over regions with limited and/or even absent in situ PM2.5 concentration measurements. The validation results indicated that the gap-free PM2.5 concentration estimates exhibit higher prediction accuracies, with an R of 0.95 and an RMSE of 5.7 µg m−3, compared to PM2.5 concentration measurements obtained from former holdout sites worldwide. Overall, while leveraging state-of-the-art methods in data science and artificial intelligence, a quality-enhanced LGHAP v2 dataset was generated through big Earth data analytics by cohesively weaving together multimodal AODs and air quality measurements from diverse sources. The gap-free, high-resolution, and global coverage merits render the LGHAP v2 dataset an invaluable database for advancing aerosol- and haze-related studies as well as triggering multidisciplinary applications for environmental management, health-risk assessment, and climate change attribution. All gap-free AOD and PM2.5 concentration grids in the LGHAP v2 dataset, as well as the data user guide and relevant visualization codes, are publicly accessible at (last access: 3 April 2024, Bai and Li, 2023a).