npj Computational Materials (Mar 2024)

Leveraging language representation for materials exploration and discovery

  • Jiaxing Qu,
  • Yuxuan Richard Xie,
  • Kamil M. Ciesielski,
  • Claire E. Porter,
  • Eric S. Toberer,
  • Elif Ertekin

DOI
https://doi.org/10.1038/s41524-024-01231-8
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Data-driven approaches to materials exploration and discovery are building momentum due to emerging advances in machine learning. However, parsimonious representations of crystals for navigating the vast materials search space remain limited. To address this limitation, we introduce a materials discovery framework that utilizes natural language embeddings from language models as representations of compositional and structural features. The contextual knowledge encoded in these language representations conveys information about material properties and structures, enabling both similarity analysis to recall relevant candidates based on a query material and multi-task learning to share information across related properties. Applying this framework to thermoelectrics, we demonstrate diversified recommendations of prototype crystal structures and identify under-studied material spaces. Validation through first-principles calculations and experiments confirms the potential of the recommended materials as high-performance thermoelectrics. Language-based frameworks offer versatile and adaptable embedding structures for effective materials exploration and discovery, applicable across diverse material systems.