Pulmonary Circulation (Nov 2019)
Utilising artificial intelligence to determine patients at risk of a rare disease: idiopathic pulmonary arterial hypertension
Abstract
Idiopathic pulmonary arterial hypertension is a rare and life-shortening condition often diagnosed at an advanced stage. Despite increased awareness, the delay to diagnosis remains unchanged. This study explores whether a predictive model based on healthcare resource utilisation can be used to screen large populations to identify patients at high risk of idiopathic pulmonary arterial hypertension. Hospital Episode Statistics from the National Health Service in England, providing close to full national coverage, were used as a measure of healthcare resource utilisation. Data for patients with idiopathic pulmonary arterial hypertension from the National Pulmonary Hypertension Service in Sheffield were linked to pre-diagnosis Hospital Episode Statistics records. A non-idiopathic pulmonary arterial hypertension control cohort was selected from the Hospital Episode Statistics population. Patient history was limited to ≤5 years pre-diagnosis. Information on demographics, timing/frequency of diagnoses, medical specialities visited and procedures undertaken was captured. For modelling, a bagged gradient boosting trees algorithm was used to discriminate between cohorts. Between 2008 and 2016, 709 patients with idiopathic pulmonary arterial hypertension were identified and compared with a stratified cohort of 2,812,458 patients classified as non-idiopathic pulmonary arterial hypertension with ≥1 ICD-10 coded diagnosis of relevance to idiopathic pulmonary arterial hypertension. A predictive model was developed and validated using cross-validation. The timing and frequency of the clinical speciality seen, secondary diagnoses and age were key variables driving the algorithm’s performance. To identify the 100 patients at highest risk of idiopathic pulmonary arterial hypertension, 969 patients would need to be screened with a specificity of 99.99% and sensitivity of 14.10% based on a prevalence of 5.5/million. The positive predictive and negative predictive values were 10.32% and 99.99%, respectively. This study highlights the potential application of artificial intelligence to readily available real-world data to screen for rare diseases such as idiopathic pulmonary arterial hypertension. This algorithm could provide low-cost screening at a population level, facilitating earlier diagnosis, improved diagnostic rates and patient outcomes. Studies to further validate this approach are warranted.