Machine learning to identify chronic cough from administrative claims data

Vishal Bali; Vladimir Turzhitsky; Jonathan Schelfhout; Misti Paudel; Erin Hulbert; Jesse Peterson-Brandt; Jeffrey Hertzberg; Neal R. Kelly; Raja H. Patel

doi:10.1038/s41598-024-51522-9

Scientific Reports (Jan 2024)

Machine learning to identify chronic cough from administrative claims data

Vishal Bali,
Vladimir Turzhitsky,
Jonathan Schelfhout,
Misti Paudel,
Erin Hulbert,
Jesse Peterson-Brandt,
Jeffrey Hertzberg,
Neal R. Kelly,
Raja H. Patel

Affiliations

Vishal Bali: Center for Observational and Real-World Evidence (CORE)
Vladimir Turzhitsky: Center for Observational and Real-World Evidence (CORE)
Jonathan Schelfhout: Center for Observational and Real-World Evidence (CORE)
Misti Paudel: Health Economics and Outcomes Research (HEOR)
Erin Hulbert: Health Economics and Outcomes Research (HEOR)
Jesse Peterson-Brandt: Health Economics and Outcomes Research (HEOR)
Jeffrey Hertzberg: OptumLabs
Neal R. Kelly: OptumLabs
Raja H. Patel: OptumLabs

DOI: https://doi.org/10.1038/s41598-024-51522-9
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Accurate identification of patient populations is an essential component of clinical research, especially for medical conditions such as chronic cough that are inconsistently defined and diagnosed. We aimed to develop and compare machine learning models to identify chronic cough from medical and pharmacy claims data. In this retrospective observational study, we compared 3 machine learning algorithms based on XG Boost, logistic regression, and neural network approaches using a large claims and electronic health record database. Of the 327,423 patients who met the study criteria, 4,818 had chronic cough based on linked claims–electronic health record data. The XG Boost model showed the best performance, achieving a Receiver-Operator Characteristic Area Under the Curve (ROC-AUC) of 0.916. We selected a cutoff that favors a high positive predictive value (PPV) to minimize false positives, resulting in a sensitivity, specificity, PPV, and negative predictive value of 18.0%, 99.6%, 38.7%, and 98.8%, respectively on the held-out testing set (n = 82,262). Logistic regression and neural network models achieved slightly lower ROC-AUCs of 0.907 and 0.838, respectively. The XG Boost and logistic regression models maintained their robust performance in subgroups of individuals with higher rates of chronic cough. Machine learning algorithms are one way of identifying conditions that are not coded in medical records, and can help identify individuals with chronic cough from claims data with a high degree of classification value.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal