Calibration after bootstrap for accurate uncertainty quantification in regression models

Glenn Palmer; Siqi Du; Alexander Politowicz; Joshua Paul Emory; Xiyu Yang; Anupraas Gautam; Grishma Gupta; Zhelong Li; Ryan Jacobs; Dane Morgan

doi:10.1038/s41524-022-00794-8

npj Computational Materials (May 2022)

Calibration after bootstrap for accurate uncertainty quantification in regression models

Glenn Palmer,
Siqi Du,
Alexander Politowicz,
Joshua Paul Emory,
Xiyu Yang,
Anupraas Gautam,
Grishma Gupta,
Zhelong Li,
Ryan Jacobs,
Dane Morgan

Affiliations

Glenn Palmer: Department of Computer Sciences, University of Wisconsin-Madison
Siqi Du: Department of Materials Science and Engineering, University of Wisconsin-Madison
Alexander Politowicz: Department of Materials Science and Engineering, University of Wisconsin-Madison
Joshua Paul Emory: Department of Materials Science and Engineering, University of Wisconsin-Madison
Xiyu Yang: Department of Materials Science and Engineering, University of Wisconsin-Madison
Anupraas Gautam: Department of Computer Sciences, University of Wisconsin-Madison
Grishma Gupta: Department of Computer Sciences, University of Wisconsin-Madison
Zhelong Li: Department of Materials Science and Engineering, University of Wisconsin-Madison
Ryan Jacobs: Department of Materials Science and Engineering, University of Wisconsin-Madison
Dane Morgan: Department of Materials Science and Engineering, University of Wisconsin-Madison

DOI: https://doi.org/10.1038/s41524-022-00794-8
Journal volume & issue: Vol. 8, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Obtaining accurate estimates of machine learning model uncertainties on newly predicted data is essential for understanding the accuracy of the model and whether its predictions can be trusted. A common approach to such uncertainty quantification is to estimate the variance from an ensemble of models, which are often generated by the generally applicable bootstrap method. In this work, we demonstrate that the direct bootstrap ensemble standard deviation is not an accurate estimate of uncertainty but that it can be simply calibrated to dramatically improve its accuracy. We demonstrate the effectiveness of this calibration method for both synthetic data and numerous physical datasets from the field of Materials Science and Engineering. The approach is motivated by applications in physical and biological science but is quite general and should be applicable for uncertainty quantification in a wide range of machine learning regression models.

Published in npj Computational Materials

ISSN: 2057-3960 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Materials of engineering and construction. Mechanics of materials; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://www.nature.com/npjcompumats/

About the journal