npj Computational Materials (Sep 2022)

Training data selection for accuracy and transferability of interatomic potentials

  • David Montes de Oca Zapiain,
  • Mitchell A. Wood,
  • Nicholas Lubbers,
  • Carlos Z. Pereyra,
  • Aidan P. Thompson,
  • Danny Perez

DOI
https://doi.org/10.1038/s41524-022-00872-x
Journal volume & issue
Vol. 8, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Advances in machine learning (ML) have enabled the development of interatomic potentials that promise the accuracy of first principles methods and the low-cost, parallel efficiency of empirical potentials. However, ML-based potentials struggle to achieve transferability, i.e., provide consistent accuracy across configurations that differ from those used during training. In order to realize the promise of ML-based potentials, systematic and scalable approaches to generate diverse training sets need to be developed. This work creates a diverse training set for tungsten in an automated manner using an entropy optimization approach. Subsequently, multiple polynomial and neural network potentials are trained on the entropy-optimized dataset. A corresponding set of potentials are trained on an expert-curated dataset for tungsten for comparison. The models trained to the entropy-optimized data exhibited superior transferability compared to the expert-curated models. Furthermore, the models trained to the expert-curated set exhibited a significant decrease in performance when evaluated on out-of-sample configurations.