IEEE Access (Jan 2021)

Employing and Interpreting a Machine Learning Target-Cognizant Technique for Analysis of Unknown Signals in Multiple Reaction Monitoring

  • Ryan A. Mccarthy,
  • Ananya Sen Gupta

DOI
https://doi.org/10.1109/ACCESS.2021.3056955
Journal volume & issue
Vol. 9
pp. 24727 – 24737

Abstract

Read online

The aim of this interdisciplinary work is a robust signal processing and autonomous machine learning framework to associate well-known (target) as well as any potentially unknown (non-target) peaks present within gas chromatography-mass spectrometry (GC/MS/MS) raw instrument signal. Particularly, this work evaluates three machine learning algorithms abilities to autonomously associate raw signal peaks based on accuracy in training and testing. A target is a known congener that is expected to be present within the raw instrument signal and a non-target is an unknown or unexpected compound. Autonomously identifying target peaks within the GC/MS/MS and associating them with non-target peaks can help improve the analysis of collected samples. Association of peaks refers to classifying peaks as known congeners regardless if the peak is a target or non-target. Uncertainty of peaks fitted and discovered through raw instrument signals from GC/MS/MS data is assessed to create topographical illustrations of target annotated peaks among sample raw instrument signals collected across diverse locations in the Chicago area. The term “annotated peak” is used to assign peaks found at specific retention times as a known congener. Adaptive signal processing techniques are utilized to smooth data and correct baseline drifts as well as detect and separate coeluted (overlapped) peaks in the raw instrument signal to provide key feature extraction. 150 air samples are analyzed for individual polychlorinated biphenyls (PCB) with GC/MS/MS across Chicago, IL. 80% of the data is used for training classification of target PCBs and 20% of the data is evaluated to identify and associate consistently occurring non-target peaks with target PCBs. A random forest classifier is used to associate identified peaks to target PCB peaks. Geographical topographical representations of target PCBs in the raw instrument signal demonstrates how PCBs accumulate and degrade in certain locations.

Keywords