Clinical Epidemiology (Dec 2023)
Correctness and Completeness of Breast Cancer Diagnoses Recorded in UK CPRD Aurum and CPRD GOLD Databases: Comparison to Hospital Episode Statistics and Cancer Registry (Companion Paper 2)
Abstract
Katrina Wilcox Hagberg,1 Catherine Vasilakis-Scaramozza,2 Rebecca Persson,1 David Neasham,2 George Kafatos,2 Susan Jick1,3 1Epidemiology, Boston Collaborative Drug Surveillance Program, Lexington, MA, USA; 2Center for Observational Research, Amgen Ltd, Uxbridge, UK; 3Epidemiology, Boston University School of Public Health, Boston, MA, USACorrespondence: Susan Jick, Boston Collaborative Drug Surveillance Program, 11 Muzzey Street, Lexington, MA, 02421, USA, Tel +1 781 862 6660, Fax +1 781 862 1680, Email [email protected]: To evaluate the new Clinical Practice Research Datalink (CPRD) Aurum database, we estimated ‘correctness’ (ie accuracy, validity) and ‘completeness’ (ie presence, missingness) of malignant breast cancer diagnoses recorded in CPRD Aurum compared to external linked data sources: Hospital Episode Statistics (HES) Admitted Patient Care (APC), HES Outpatient (OP), and Cancer Registry (CR), and to the previously validated CPRD GOLD.Methods: Linkage-eligible, female patients with incident malignant breast cancer diagnosis recorded in at least one study data source were selected. Correctness was the proportion of malignant breast cancer cases recorded in CPRD Aurum or GOLD who also had a diagnosis recorded in HES APC/OP (2004– 2019) or CR (2004– 2016). Completeness was estimated by identifying all malignant breast cancer diagnoses in HES APC/OP or CR and calculating the proportion with a concordant diagnosis in CPRD Aurum or GOLD.Results: Compared to HES APC/OP, there were 85,659 and 31,452 eligible patients in CPRD Aurum and GOLD, respectively. Correctness estimates were high (CPRD Aurum 83.5%, GOLD 81.7%). Compared to CR, there were 70,190 and 29,597 eligible patients in CPRD Aurum and GOLD, respectively: correctness was 89.1% for CPRD Aurum and 88.2% for GOLD. Completeness estimates for CPRD Aurum and GOLD were high (> 90%). Diagnoses were recorded in CPRD Aurum within − 7 to 74 days of those in the linked sources. Reasons for discordant diagnostic coding included presence of treatment or other clinical codes only, diagnosis coded after end of follow-up, non-malignant breast cancer in linked data, and administrative codes in lieu of diagnostic codes.Conclusion: These results indicate that correctness and completeness of malignant breast cancer diagnoses in CPRD Aurum were high and similar to CPRD GOLD. This provides confidence in use of CPRD Aurum for research purposes. Where complete case capture is important, researchers should consider linkage to HES APC or CR.Keywords: CPRD Aurum, CPRD GOLD, breast cancer, validation