Discover Applied Sciences (Feb 2025)
Post hoc calibration of medical segmentation models
Abstract
Abstract Background and objective Deep neural networks have become state-of-the-art in medical image segmentation. However, the calibration of these models is an often overlooked aspect of the model’s performance, even though calibrated outputs communicate an intuitive measure of uncertainty toward the user. While other uncertainty measures have been applied in segmentation, work using existing post hoc calibration methods is lacking. Methods In this paper, we investigated several post hoc calibration methods and introduced two straightforward extensions of Platt scaling and beta calibration that leverage spatial information available in the segmentation map. We compare these methods on the BraTS 2018, ISLES 2018, and QUBIQ datasets. Results On average, the fine-tuning method, isotonic regression method, and the extension of beta calibration performed the best calibration-wise: the Expected Calibration Error (ECE) decreased by 67.6%, 66%, and 65.5%, respectively. The segmentation performance measured in Dice score dropped by 3.5%, 10.9%, and 4.4%, respectively. However, Dice scores were negatively impacted by one of the segmentation tasks. Conclusion Overall, the post hoc calibration methods improve the calibration of the outputs with only a small change in segmentation quality. We find that different methods provide better performance in different settings, indicating that a model selection approach can be an effective method for identifying the most appropriate calibration method. Our recommendation is to apply these methods in medical image segmentation to improve the interpretability and statistical validity of the models.
Keywords