Geoscientific Model Development (Feb 2022)

Deep-learning spatial principles from deterministic chemical transport models for chemical reanalysis: an application in China for PM<sub>2.5</sub>

  • B. Lyu,
  • R. Huang,
  • X. Wang,
  • W. Wang,
  • Y. Hu

DOI
https://doi.org/10.5194/gmd-15-1583-2022
Journal volume & issue
Vol. 15
pp. 1583 – 1594

Abstract

Read online

Well-estimated air pollutant concentration fields are critically important to compensate for observations that are only sparsely available, especially over non-urban areas. Previous data fusion methods generally used statistical models to relate observations of target variables to proxy data and supporting variables at known stations. In this study, we developed a new data fusion paradigm by designing a deep-learning model framework and workflow to learn multivariable spatial correlations from chemical transport model (CTM) simulations, before using it to estimate PM2.5 reanalysis fields from station observations. The model was composed of two modules as an explainable PointConv operation to pre-process isolated observations and a regression grid-to-grid network to build correlations among multiple variables. The model was trained with only CTM simulations and supporting geographical covariates. The trained model was evaluated in two aspects of (1) reproducing raw PM2.5 CTM simulations and (2) generating reanalysis and fused PM2.5 fields. First, the model was able to reproduce the CTM simulations well on a full domain from sampled CTM data items at sparse locations with an average R2=0.94 and RMSE = 4.85 µg m−3. Second, the fused PM2.5 fields estimated from observations achieved a good performance with R2=0.77 (RMSE = 14.29 µg m−3) and R2=0.84 (RMSE = 12.96 µg m−3) respectively evaluated at the stringent city level and station level. The generated reanalysis PM2.5 fields have complete spatial coverage within the modeling domain. One significant benefit of the fusion framework is that the model training does not rely on observations, which can be used to predict PM2.5 fields in newly set up observation networks such as those using portable sensors. Meanwhile, in the prediction procedure, only station observations are used along with supporting covariates. The fusion model has high computing efficiency (< 1 s d−1) due to acceleration using a graphical processing unit (GPU). As an alternative to generate chemical reanalysis fields, the method can be readily implemented in near-real time and be universally applied for other simulated variables with measurements available.