Post hoc calibration of medical segmentation models

Axel-Jan Rousseau; Thijs Becker; Simon Appeltans; Matthew Blaschko; Dirk Valkenborg

doi:10.1007/s42452-025-06587-0

Discover Applied Sciences (Feb 2025)

Post hoc calibration of medical segmentation models

Axel-Jan Rousseau,
Thijs Becker,
Simon Appeltans,
Matthew Blaschko,
Dirk Valkenborg

Affiliations

Axel-Jan Rousseau: CenStat, Data Science Institute, Hasselt University
Thijs Becker: AMO, Flemish Institute for Technological Research (VITO)
Simon Appeltans: CenStat, Data Science Institute, Hasselt University
Matthew Blaschko: Processing Speech and Images, Department of Electrical Engineering, KU Leuven
Dirk Valkenborg: CenStat, Data Science Institute, Hasselt University

DOI: https://doi.org/10.1007/s42452-025-06587-0
Journal volume & issue: Vol. 7, no. 3
pp. 1 – 17

Abstract

Read online

Abstract Background and objective Deep neural networks have become state-of-the-art in medical image segmentation. However, the calibration of these models is an often overlooked aspect of the model’s performance, even though calibrated outputs communicate an intuitive measure of uncertainty toward the user. While other uncertainty measures have been applied in segmentation, work using existing post hoc calibration methods is lacking. Methods In this paper, we investigated several post hoc calibration methods and introduced two straightforward extensions of Platt scaling and beta calibration that leverage spatial information available in the segmentation map. We compare these methods on the BraTS 2018, ISLES 2018, and QUBIQ datasets. Results On average, the fine-tuning method, isotonic regression method, and the extension of beta calibration performed the best calibration-wise: the Expected Calibration Error (ECE) decreased by 67.6%, 66%, and 65.5%, respectively. The segmentation performance measured in Dice score dropped by 3.5%, 10.9%, and 4.4%, respectively. However, Dice scores were negatively impacted by one of the segmentation tasks. Conclusion Overall, the post hoc calibration methods improve the calibration of the outputs with only a small change in segmentation quality. We find that different methods provide better performance in different settings, indicating that a model selection approach can be an effective method for identifying the most appropriate calibration method. Our recommendation is to apply these methods in medical image segmentation to improve the interpretability and statistical validity of the models.

Published in Discover Applied Sciences

ISSN: 3004-9261 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Science (General)
Website: https://link.springer.com/journal/42452

About the journal

Abstract

Keywords