Separations (Sep 2022)
Comparative Prediction of Gas Chromatographic Retention Indices for GC/MS Identification of Chemicals Related to Chemical Weapons Convention by Incremental and Machine Learning Methods
Abstract
During on-site verification activities conducted by the Technical Secretariat of Organization for the Prohibition of Chemical Weapons, identification by gas chromatography retention indices (RI) data, in addition to mass spectrometry data, increase the reliability of factual findings. However, reference RIs do not cover all the possible chemical structures. That is why it is important to have models to predict RIs. Applicable only for narrow data sets of chemicals with a fixed scaffold (G- and V-series gases as example), the non-learning incremental method demonstrated predictive median absolute and percentage errors of 2–4 units and 0.1–0.2%; these are comparable with the experimental bias in RI measurements in the same laboratory with the same GC conditions. It outperforms the accuracy of two reported machine learning methods–median absolute and percentage errors of 11–52 units and 0.5–2.8%. However, for the whole Chemical Weapons Convention (CWC) data set of chemicals, when a fixed scaffold is absent, the incremental method is not applicable; essential machine learning methods achieved accuracy: median absolute and percentage errors of 29–33 units and 0.5–2.2%, depending on the machine learning method. In addition, we have developed a homology tree approach as a convenient method for the visualization of the CWC chemical space. We conclude that non-learning incremental methods may be more accurate than the state-of-the-art machine learning techniques in particular cases, such as predicting the RIs of homologues and isomers of chemicals related to CWC.
Keywords