npj Computational Materials (Sep 2022)
Training data selection for accuracy and transferability of interatomic potentials
Abstract
Abstract Advances in machine learning (ML) have enabled the development of interatomic potentials that promise the accuracy of first principles methods and the low-cost, parallel efficiency of empirical potentials. However, ML-based potentials struggle to achieve transferability, i.e., provide consistent accuracy across configurations that differ from those used during training. In order to realize the promise of ML-based potentials, systematic and scalable approaches to generate diverse training sets need to be developed. This work creates a diverse training set for tungsten in an automated manner using an entropy optimization approach. Subsequently, multiple polynomial and neural network potentials are trained on the entropy-optimized dataset. A corresponding set of potentials are trained on an expert-curated dataset for tungsten for comparison. The models trained to the entropy-optimized data exhibited superior transferability compared to the expert-curated models. Furthermore, the models trained to the expert-curated set exhibited a significant decrease in performance when evaluated on out-of-sample configurations.