Probing out-of-distribution generalization in machine learning for materials

Kangming Li; Andre Niyongabo Rubungo; Xiangyun Lei; Daniel Persaud; Kamal Choudhary; Brian DeCost; Adji Bousso Dieng; Jason Hattrick-Simpers

doi:10.1038/s43246-024-00731-w

Communications Materials (Jan 2025)

Probing out-of-distribution generalization in machine learning for materials

Kangming Li,
Andre Niyongabo Rubungo,
Xiangyun Lei,
Daniel Persaud,
Kamal Choudhary,
Brian DeCost,
Adji Bousso Dieng,
Jason Hattrick-Simpers

Affiliations

Kangming Li: Department of Materials Science and Engineering, University of Toronto
Andre Niyongabo Rubungo: Vertaix, Department of Computer Science, Princeton University
Xiangyun Lei: Toyota Research Institute
Daniel Persaud: Department of Materials Science and Engineering, University of Toronto
Kamal Choudhary: Material Measurement Laboratory, National Institute of Standards and Technology
Brian DeCost: Material Measurement Laboratory, National Institute of Standards and Technology
Adji Bousso Dieng: Vertaix, Department of Computer Science, Princeton University
Jason Hattrick-Simpers: Department of Materials Science and Engineering, University of Toronto

DOI: https://doi.org/10.1038/s43246-024-00731-w
Journal volume & issue: Vol. 6, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Scientific machine learning (ML) aims to develop generalizable models, yet assessments of generalizability often rely on heuristics. Here, we demonstrate in the materials science setting that heuristic evaluations lead to biased conclusions of ML generalizability and benefits of neural scaling, through evaluations of out-of-distribution (OOD) tasks involving unseen chemistry or structural symmetries. Surprisingly, many tasks demonstrate good performance across models, including boosted trees. However, analysis of the materials representation space shows that most test data reside within regions well-covered by training data, while poorly-performing tasks involve data outside the training domain. For these challenging tasks, increasing training size or time yields limited or adverse effects, contrary to traditional neural scaling trends. Our findings highlight that most OOD tests reflect interpolation, not true extrapolation, leading to overestimations of generalizability and scaling benefits. This emphasizes the need for rigorously challenging OOD benchmarks.

Published in Communications Materials

ISSN: 2662-4443 (Online)
Publisher: Nature Portfolio
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Materials of engineering and construction. Mechanics of materials
Website: https://www.nature.com/commsmat/

About the journal