LLM4Mat-bench: benchmarking large language models for materials property prediction

Andre Niyongabo Rubungo; Kangming Li; Jason Hattrick-Simpers; Adji Bousso Dieng

doi:10.1088/2632-2153/add3bb

Machine Learning: Science and Technology (Jan 2025)

LLM4Mat-bench: benchmarking large language models for materials property prediction

Andre Niyongabo Rubungo,
Kangming Li,
Jason Hattrick-Simpers,
Adji Bousso Dieng

Affiliations

Andre Niyongabo Rubungo: ORCiD; Department of Computer Science, Princeton University , Princeton, NJ, United States of America; Vertaix , Princeton, NJ, United States of America
Kangming Li: ORCiD; Acceleration Consortium, University of Toronto , Toronto, Canada
Jason Hattrick-Simpers: ORCiD; Acceleration Consortium, University of Toronto , Toronto, Canada; Department of Materials Science and Engineering, University of Toronto , Toronto, Canada; Vector Institute for Artificial Intelligence , Toronto, Canada; Schwartz Reisman Institute for Technology and Society , Toronto, Canada
Adji Bousso Dieng: ORCiD; Department of Computer Science, Princeton University , Princeton, NJ, United States of America; Vertaix , Princeton, NJ, United States of America

DOI: https://doi.org/10.1088/2632-2153/add3bb
Journal volume & issue: Vol. 6, no. 2
p. 020501

Abstract

Read online

Large language models (LLMs) are increasingly being used in materials science. However, little attention has been given to benchmarking and standardized evaluation for LLM-based materials property prediction, which hinders progress. We present LLM4Mat-Bench, the largest benchmark to date for evaluating the performance of LLMs in predicting the properties of crystalline materials. LLM4Mat-Bench contains about 1.9 M crystal structures in total, collected from 10 publicly available materials data sources, and 45 distinct properties. LLM4Mat-Bench features different input modalities: crystal composition, CIF, and crystal text description, with 4.7 M, 615.5 M, and 3.1B tokens in total for each modality, respectively. We use LLM4Mat-Bench to fine-tune models with different sizes, including LLM-Prop and MatBERT, and provide zero-shot and few-shot prompts to evaluate the property prediction capabilities of LLM-chat-like models, including Llama, Gemma, and Mistral. The results highlight the challenges of general-purpose LLMs in materials science and the need for task-specific predictive models and task-specific instruction-tuned LLMs in materials property prediction ^7 .

Published in Machine Learning: Science and Technology

ISSN: 2632-2153 (Online)
Publisher: IOP Publishing
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://iopscience.iop.org/journal/2632-2153

About the journal

Abstract

Keywords