Scientific Reports (Apr 2024)
A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia
Abstract
Abstract Homozygous familial hypercholesterolemia (HoFH) is an underdiagnosed and undertreated ultra-rare disease. We utilized claims data from the Komodo Healthcare Map database to develop a machine-learning model to identify potential HoFH patients. We tokenized patients enrolled in MyRARE (patient support program for those prescribed evinacumab-dgnb in the United States) and linked them with their Komodo claims. A true positive HoFH cohort (n = 331) was formed by including patients from MyRARE and patients with prescriptions for evinacumab-dgnb or lomitapide. The negative cohort (n = 1423) comprised patients with or at risk for cardiovascular disease. We divided the cohort into an 80% training and 20% testing set. Overall, 10,616 candidate features were investigated; 87 were selected due to clinical relevance and importance on prediction performance. Different machine-learning algorithms were explored, with fast interpretable greedy-tree sums selected as the final machine-learning tool. This selection was based on its satisfactory performance and its easily interpretable nature. The model identified four useful features and yielded precision (positive predicted value) of 0.98, recall (sensitivity) of 0.88, area under the receiver operating characteristic curve of 0.98, and accuracy of 0.97. The model performed well in identifying HoFH patients in the testing set, providing a useful tool to facilitate HoFH screening and diagnosis via healthcare claims data.