Journal of Cheminformatics (Jan 2023)

New algorithms demonstrate untargeted detection of chemically meaningful changing units and formula assignment for HRMS data of polymeric mixtures in the open-source constellation web application

  • Dane R. Letourneau,
  • Dennis D. August,
  • Dietrich A. Volmer

DOI
https://doi.org/10.1186/s13321-023-00680-5
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 13

Abstract

Read online

Abstract The field of high-resolution mass spectrometry (HRMS) and ancillary hyphenated techniques comprise a rapidly expanding and evolving area. As popularity of HRMS instruments grows, there is a concurrent need for tools and solutions to simplify and automate the processing of the large and complex datasets that result from these analyses. Constellation is one such of these tools, developed by our group over the last two years to perform unsupervised trend detection for repeating, polymeric units in HRMS data of complex mixtures such as natural organic matter, oil, or lignin. In this work, we develop two new unsupervised algorithms for finding chemically-meaningful changing units in HRMS data, and incorporate a molecular-formula-finding algorithm from the open-source CoreMS software package, both demonstrated here in the Constellation software environment. These algorithms are evaluated on a collection of open-source HRMS datasets containing polymeric analytes (PEG 400 and NIST standard reference material 1950, both metabolites in human plasma, as well as a swab extract containing polymers), and are able to successfully identify all known changing units in the data, including assigning the correct formulas. Through these new developments, we are excited to add to a growing body of open-source software specialized in extracting useful information from complex datasets without the high costs, technical knowledge, and processor-demand typically associated with such tools.

Keywords