A probabilistic approach for building disease phenotypes across electronic health records

David Vidmar; Jessica De Freitas; Will Thompson; John M. Pfeifer; Brandon K. Fornwalt; Noah Zimmerman; Riccardo Miotto; Ruijun Chen

doi:10.1186/s13040-025-00454-9

BioData Mining (Jun 2025)

A probabilistic approach for building disease phenotypes across electronic health records

David Vidmar,
Jessica De Freitas,
Will Thompson,
John M. Pfeifer,
Brandon K. Fornwalt,
Noah Zimmerman,
Riccardo Miotto,
Ruijun Chen

Affiliations

David Vidmar: Tempus AI
Jessica De Freitas: Tempus AI
Will Thompson: Tempus AI
John M. Pfeifer: Tempus AI
Brandon K. Fornwalt: Tempus AI
Noah Zimmerman: Tempus AI
Riccardo Miotto: Tempus AI
Ruijun Chen: Tempus AI

DOI: https://doi.org/10.1186/s13040-025-00454-9
Journal volume & issue: Vol. 18, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background Identifying the set of patients with a particular disease diagnosis across electronic health records (EHRs), referred to as a phenotype, is an important step in clinical research and applications. However, this task is often challenging, where incomplete data can render definitive classifications impossible. We propose a probabilistic approach to phenotyping based on Bayesian inference and without the need for gold-standard labels. In this paper, we develop multiple heuristic “labeling functions’’ (LFs) for 4 diseases across de-identified EHR data and aggregate their votes through a majority vote approach (MV), a popular open-source approach (Snorkel OSS), and our proposed probabilistic approach (LEVI). We compare the resulting phenotypes to those built using expert-curated logic from the literature, as well as an off-the-shelf natural language processing pipeline (Medspacy), using a curated sample of physician-reviewed labels for evaluation. Results Phenotypes built using LFs perform better than off-the-shelf alternatives on classification performance (F1 scores of 0.79–0.82 vs. expert-logic: 0.68, Medspacy: 0.55). Compared to output scores from Snorkel OSS, LEVI provides better probabilistic performance (expected calibration error of 0.04 vs. 0.12), ROC AUC estimates (interval score [loss] of 0.03 vs. 0.10), and operating point selection (equal-cost net benefit of 0.18 vs. 0.15). Conclusions For challenging disease states, phenotyping using probabilities rather than binary classification can lead to improved and more personalized downstream decision-making. Probabilistic phenotypes built using LEVI exhibit low calibration error without the need for labels, allowing for better risk-benefit tradeoffs.

Published in BioData Mining

ISSN: 1756-0381 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Analysis
Website: https://biodatamining.biomedcentral.com/

About the journal

Abstract

Keywords