Geoscientific Model Development (Dec 2022)

Predicting peak daily maximum 8 h ozone and linkages to emissions and meteorology in Southern California using machine learning methods (SoCAB-8HR V1.0)

  • Z. Gao,
  • Y. Wang,
  • P. Vasilakos,
  • C. E. Ivey,
  • C. E. Ivey,
  • K. Do,
  • K. Do,
  • A. G. Russell

DOI
https://doi.org/10.5194/gmd-15-9015-2022
Journal volume & issue
Vol. 15
pp. 9015 – 9029

Abstract

Read online

The growing abundance of data is conducive to using numerical methods to relate air quality, meteorology and emissions to address which factors impact pollutant concentrations. Often, it is the extreme values that are of interest for health and regulatory purposes (e.g., the National Ambient Air Quality Standard for ozone uses the annual maximum daily fourth highest 8 h average (MDA8) ozone), though such values are the most challenging to predict using empirical models. We developed four different computational models, including the generalized additive model (GAM), multivariate adaptive regression splines, random forest, and support vector regression, to develop observation-based relationships between the fourth highest MDA8 ozone in the South Coast Air Basin and precursor emissions, meteorological factors and large-scale climate patterns. All models had similar predictive performance, though the GAM showed a relatively higher R2 value (0.96) with a lower root mean square error and mean bias.