Artificial Intelligence Chemistry (Dec 2024)

Evaluation of machine learning models for the accelerated prediction of density functional theory calculated 19F chemical shifts based on local atomic environments

  • Sophia Li,
  • Emma Wang,
  • Leia Pei,
  • Sourodeep Deb,
  • Prashanth Prabhala,
  • Sai Hruday Reddy Nara,
  • Raina Panda,
  • Shiven Eltepu,
  • Marx Akl,
  • Larry McMahan,
  • Edward Njoo

Journal volume & issue
Vol. 2, no. 2
p. 100078

Abstract

Read online

The introduction of fluorine in compounds plays a crucial role in drug development as it greatly influences their final pharmacokinetic and dynamic properties. Due to the prevalence of fluorine in FDA-approved drugs in recent years, identifying the mechanisms driving their chemical transformations has become crucial in the drug discovery landscape. 19F NMR spectroscopy is a powerful analytical technique that allows for the examination of fluorine-containing compounds, offering valuable information about their structure, dynamics, and reactivity. NMR spectra can be interpreted through the leveraging of Density Functional Theory (DFT). However, the screening of compounds and discovery of feasible drug candidates is limited due to its computational cost. Here, we present a machine learning approach to accelerate the prediction of DFT-calculated 19F NMR chemical shifts. The fluorine atoms’ features in the models were derived from their local three-dimensional environments, representing their neighboring atoms within a radius of n Å away from the given fluorine atom in the compound. A comparative analysis of thirteen regression models was conducted using features extracted from 501 fluorinated compounds in our laboratory’s chemical inventory. Among the models, Gradient Boosting Regression (GBR) exhibited the highest performance, achieving a mean absolute error of 3.31 ppm with a local environment radius of 3 Å. This demonstrates a comparable accuracy to DFT calculations while reducing computational time from several hundred seconds to milliseconds. 3 Å was also found to be the most optimal radius across all models when encoding features for local atomic environments.

Keywords