Wellcome Open Research (Jun 2024)
Estimating disease burden using national linked electronic health records: a study using an English population-based cohort. [version 2; peer review: 2 approved]
Abstract
Background Electronic health records (EHRs) have the potential to be used to produce detailed disease burden estimates. In this study we created disease estimates using national EHR for three high burden conditions, compared estimates between linked and unlinked datasets and produced stratified estimates by age, sex, ethnicity, socio-economic deprivation and geographical region. Methods EHRs containing primary care (Clinical Practice Research Datalink), secondary care (Hospital Episode Statistics) and mortality records (Office for National Statistics) were used. We used existing disease phenotyping algorithms to identify cases of cancer (breast, lung, colorectal and prostate), type 1 and 2 diabetes, and lower back pain. We calculated age-standardised incidence of first cancer, point prevalence for diabetes, and primary care consultation prevalence for low back pain. Results 7.2 million people contributing 45.3 million person-years of active follow-up between 2000–2014 were included. CPRD-HES combined and CPRD-HES-ONS combined lung and bowel cancer incidence estimates by sex were similar to cancer registry estimates. Linked CPRD-HES estimates for combined Type 1 and Type 2 diabetes were consistently higher than those of CPRD alone, with the difference steadily increasing over time from 0.26% (2.99% for CPRD-HES vs. 2.73 for CPRD) in 2002 to 0.58% (6.17% vs. 5.59) in 2013. Low back pain prevalence was highest in the most deprived quintile and when compared to the least deprived quintile the difference in prevalence increased over time between 2000 and 2013, with the largest difference of 27% (558.70 per 10,000 people vs 438.20) in 2013. Conclusions We use national EHRs to produce estimates of burden of disease to produce detailed estimates by deprivation, ethnicity and geographical region. National EHRs have the potential to improve disease burden estimates at a local and global level and may serve as more automated, timely and precise inputs for policy making and global burden of disease estimation.