Natural Sciences (Oct 2022)
Convolutional neural network prediction of molecular properties for aerosol chemistry and health effects
Abstract
Abstract Quinones are chemical compounds commonly found in air particulate matter (PM). Their redox activity can generate reactive oxygen species (ROS) and contribute to the oxidative potential (OP) of PM leading to adverse health effects of aerosols. The quinones' OP and ability to form ROS are linked to their reduction potential (RP, measured in volts), a metric for the tendency to lose electrons in redox reactions. Here, we use convolutional neural networks (CNN) as quantitative structure‐activity relationship (QSAR) models to relate the one‐electron RP of quinones to their molecular structure. For CNN training and testing, a data set of more than 100,000 quinones with associated RP values derived from density functional theory calculations was encoded in simplified molecular input line entry system (SMILES). The best performing CNN model achieved a root mean square error (RMSE) of 0.115 V for an independent test data set and outperformed linear regression models fitted on common molecular descriptors (≥ 0.140 V RMSE). Augmentation methods were newly adapted or applied to support CNN training with smaller data sets, improving RMSE by up to approximately 37% for a data set of 321 molecules. Adjusted for solvent effects, the CNN‐derived RP predictions showed good agreement with experimental data. Using the newly developed method, we identified a subset of atmospherically relevant quinones that are likely to have a high OP and play a role in aerosol health effects, which remains to be further elucidated by experimental studies. We suggest to use the presented machine learning approach in further investigations of atmospheric aerosol chemistry and health effects as well as other studies that require a target‐oriented screening of the properties and effects of large classes of substances. Key Points 1.Convolutional neural networks can be used to estimate unknown physical and chemical properties of chemical substances and outperform additive group contribution methods. 2.Augmentation methods aid in the prevalent problem of data availability. 3.Quinone species detected in the environment are screened for potential relevance in atmospheric chemistry and public health and presented in this study.
Keywords