Journal of the American Heart Association: Cardiovascular and Cerebrovascular Disease (Jun 2024)
Improving the Detection of Potential Cases of Familial Hypercholesterolemia: Could Machine Learning Be Part of the Solution?
Abstract
Background Familial hypercholesterolemia (FH), while highly prevalent, is a significantly underdiagnosed monogenic disorder. Improved detection could reduce the large number of cardiovascular events attributable to poor case finding. We aimed to assess whether machine learning algorithms outperform clinical diagnostic criteria (signs, history, and biomarkers) and the recommended screening criteria in the United Kingdom in identifying individuals with FH‐causing variants, presenting a scalable screening criteria for general populations. Methods and Results Analysis included UK Biobank participants with whole exome sequencing, classifying them as having FH when (likely) pathogenic variants were detected in their LDLR, APOB, or PCSK9 genes. Data were stratified into 3 data sets for (1) feature importance analysis; (2) deriving state‐of‐the‐art statistical and machine learning models; (3) evaluating models' predictive performance against clinical diagnostic and screening criteria: Dutch Lipid Clinic Network, Simon Broome, Make Early Diagnosis to Prevent Early Death, and Familial Case Ascertainment Tool. One thousand and three of 454 710 participants were classified as having FH. A Stacking Ensemble model yielded the best predictive performance (sensitivity, 74.93%; precision, 0.61%; accuracy, 72.80%, area under the receiver operating characteristic curve, 79.12%) and outperformed clinical diagnostic criteria and the recommended screening criteria in identifying FH variant carriers within the validation data set (figures for Familial Case Ascertainment Tool, the best baseline model, were 69.55%, 0.44%, 65.43%, and 71.12%, respectively). Our model decreased the number needed to screen compared with the Familial Case Ascertainment Tool (164 versus 227). Conclusions Our machine learning–derived model provides a higher pretest probability of identifying individuals with a molecular diagnosis of FH compared with current approaches. This provides a promising, cost‐effective scalable tool for implementation into electronic health records to prioritize potential FH cases for genetic confirmation.
Keywords