Geoscientific Model Development (Aug 2022)
A machine learning methodology for the generation of a parameterization of the hydroxyl radical
Abstract
We present a methodology that uses gradient-boosted regression trees (a machine learning technique) and a full-chemistry simulation (i.e., training dataset) from a chemistry–climate model (CCM) to efficiently generate a parameterization of tropospheric hydroxyl radical (OH) that is a function of chemical, dynamical, and solar irradiance variables. This surrogate model of OH is designed to be integrated into a CCM and allow for computationally efficient simulation of nonlinear feedbacks between OH and tropospheric constituents that have loss by reaction with OH as their primary sinks (e.g., carbon monoxide (CO), methane (CH4), volatile organic compounds (VOCs)). Such a model framework is advantageous for studies that require multi-decadal simulations of CH4 or multi-year sensitivity simulations to understand the causes of trends and variations of CO and CH4. To allow the user to easily target the training dataset towards a desired application, we are outlining a methodology to generate a parameterization of OH and not presenting an “off-the-shelf” version of a parameterization to be incorporated into a CCM. This provides for the relatively easy creation of a new parameterization in response to, for example, changes in research goals or the underlying CCM chemistry and/or dynamics schemes. We show that a sample parameterization of OH generated from a CCM simulation is able to reproduce OH concentrations with a normalized root-mean-square error of approximately 5 % and capture the global mean methane lifetime within approximately 1 %. Our calculated accuracy of the parameterization assumes inputs being within the bounds of the training dataset. Large excursions from these bounds will likely decrease the overall accuracy. However, we show that the sample parameterization predicts large deviations in OH for an El Niño event that was not part of the training dataset and that the spatial distribution and strength of these deviations are consistent with the event. This result gives confidence in the fidelity of a parameterization developed with our methodology to simulate the spatial and temporal responses of OH to perturbations from large variations in the chemical, dynamical, and solar irradiance drivers of OH. In addition, we discuss how two machine learning metrics, Gain feature importance and Shapley additive explanations values, indicate that the behavior of a parameterization of OH generally accords with our understanding of OH chemistry, even though there are no physics- or chemistry-based constraints on the parameterization.