Machine Learning: Science and Technology (Jan 2024)

Bridging the gap between high-level quantum chemical methods and deep learning models

  • Viki Kumar Prasad,
  • Alberto Otero-de-la-Roza,
  • Gino A DiLabio

DOI
https://doi.org/10.1088/2632-2153/ad27e1
Journal volume & issue
Vol. 5, no. 1
p. 015035

Abstract

Read online

Supervised deep learning (DL) models are becoming ubiquitous in computational chemistry because they can efficiently learn complex input-output relationships and predict chemical properties at a cost significantly lower than methods based on quantum mechanics. The central challenge in many DL applications is the need to invest considerable computational resources in generating large ( $N \gt 1 \times 10^5$ ) training sets such that the resulting DL model can be generalized reliably to unseen systems. The lack of better alternatives has encouraged the use of low-cost and relatively inaccurate density-functional theory (DFT) methods to generate training data, leading to DL models that lack accuracy and reliability. In this article, we describe a robust and easily implemented approach based on property-specific atom-centered potentials (ACPs) that resolves this central challenge in DL model development. ACPs are one-electron potentials that are applied in combination with a computationally inexpensive but inaccurate quantum mechanical method (e.g. double- ζ DFT) and fitted against relatively few high-level data ( $N \approx 1\times 10^{3}$ – $1\times 10^{4}$ ), possibly obtained from the literature. The resulting ACP-corrected methods retain the low cost of the double- ζ DFT approach, while generating high-level-quality data in unseen systems for the specific property for which they were designed. With this approach, we demonstrate that ACPs can be used as an intermediate method between high-level approaches and DL model development, enabling the calculation of large and accurate DL training sets for the chemical property of interest. We demonstrate the effectiveness of the proposed approach by predicting bond dissociation enthalpies, reaction barrier heights, and reaction energies with chemical accuracy at a computational cost lower than the DFT methods routinely used for DL training data set generation.

Keywords