Scientific Data (Aug 2024)

Retention time dataset for heterogeneous molecules in reversed–phase liquid chromatography

  • Yan Zhang,
  • Fei Liu,
  • Xiu Qin Li,
  • Yan Gao,
  • Kang Cong Li,
  • Qing He Zhang

DOI
https://doi.org/10.1038/s41597-024-03780-5
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Quantitative structure–property relationships have been extensively studied in the field of predicting retention times in liquid chromatography (LC). However, making transferable predictions is inherently complex because retention times are influenced by both the structure of the molecule and the chromatographic method used. Despite decades of development and numerous published machine learning models, the practical application of predicting small molecule retention time remains limited. The resulting models are typically limited to specific chromatographic conditions and the molecules used in their training and evaluation. Here, we have developed a comprehensive dataset comprising over 10,000 experimental retention times. These times were derived from 30 different reversed-phase liquid chromatography methods and pertain to a collection of 343 small molecules representing a wide range of chemical structures. These chromatographic methods encompass common LC setups for studying the retention behavior of small molecules. They offer a wide range of examples for modeling retention time with different LC setups.