Green Analytical Chemistry (Mar 2024)

ViridisChem's Chemical Database: Assessing the quality of experimental property data and the reliability of predicted values

  • Karuna Anna Sajeevan,
  • Kirsten Sinclair Rosselot,
  • Renu Vyas

Journal volume & issue
Vol. 8
p. 100101

Abstract

Read online

The importance of toxicity and toxicology values lies in their role in assessing the potential harmful effects of substances on the environment and living organisms, including humans. Toxicity awareness enables scientists, regulators, and healthcare professionals to make well-informed decisions regarding pharmaceuticals, chemicals, cosmetics, and consumer products, thereby safeguarding public health and the environment. To identify compound toxicities, diverse evaluation approaches have been developed, including computational methods that offer cost-effective, humane solutions. However, the development of computational methods faces challenges due to the limited availability of high-quality data on experimental properties and environmental fate that are needed to estimate toxicological properties. Missing chemical identifiers, incorrect identifiers, duplicated entries, conflicting experimental property values from various sources, and mislabeling of estimated data as experimental data by the source datasets are severe issues faced by researchers. Unfortunately, there are no standards for performance with respect to the quality of their entries, and the only metrics by which chemical property mega databases are compared is the number of entries they hold and the number of data sources they include. Here, we illustrate a means of assessing the quality of the experimental property data collected in ViridisChem's Chemical Database (www.viridischem.com) as well as the performance of predictive models it uses to fill data gaps. Randomly selected experimental property data records for Henry's Law and boiling point had a 100 % and 93 % pass rate during quality assurance, respectively. Linear regression of predicted property values and experimental values for melting point, boiling point, flash point, bioconcentration factor, soil adsorption coefficient, thermal conductivity, and surface tension resulted in R2 values ranges from 0.91 to 0.995, while lower R2 values are observed for octanol-water partition coefficient, water solubility and vapor pressure.

Keywords