Geoscientific Model Development (May 2024)

Diagnosing drivers of PM<sub>2.5</sub> simulation biases in China from meteorology, chemical composition, and emission sources using an efficient machine learning method

  • S. Wang,
  • M. Zhang,
  • Y. Gao,
  • P. Wang,
  • P. Wang,
  • Q. Fu,
  • H. Zhang,
  • H. Zhang,
  • H. Zhang

DOI
https://doi.org/10.5194/gmd-17-3617-2024
Journal volume & issue
Vol. 17
pp. 3617 – 3629

Abstract

Read online

Chemical transport models (CTMs) are widely used for air pollution modeling, which suffer from significant biases due to uncertainties in simplified parameterization, meteorological fields, and emission inventories. Accurate diagnosis of simulation biases is critical for the improvement of models, interpretation of results, and management of air quality, especially for the simulation of fine particulate matter (PM2.5). In this study, an efficient method with high speed and a low computational resource requirement based on the tree-based machine learning (ML) method, the light gradient boosting machine (LightGBM), was designed to diagnose CTM simulation biases. The drivers of the Community Multiscale Air Quality (CMAQ) model biases are compared to observations obtained by simulating PM2.5 concentrations from the perspectives of meteorology, chemical composition, and emission sources. The source-oriented CMAQ was used to diagnose the influences of different emission sources on PM2.5 biases. The model can capture the complex relationship between input variables and simulation bias well; meteorology, PM2.5 components, and source sectors can partially explain the simulation bias. The CMAQ model underestimates PM2.5 by −19.25 to −2.66 µg m−3 in 2019, especially in winter and spring and during high-PM2.5 events. Secondary organic components showed the largest contribution to the PM2.5 simulation bias for different regions and seasons (13.8 %–22.6 %) of all components. Relative humidity, cloud cover, and soil surface moisture were the main meteorological factors contributing to PM2.5 bias in the North China Plain, Pearl River Delta, and northwestern China, respectively. Primary and secondary inorganic components from residential sources showed the two largest contributions to this bias (12.05 % and 12.78 %), implying large uncertainties in this sector. The ML-based methods provide valuable complements to traditional-mechanism-based methods for model improvement, with high efficiency and low reliance on prior information.