Clinical Epidemiology (Apr 2024)

A Harmonised Approach to Curating Research-Ready Datasets for Asthma, Chronic Obstructive Pulmonary Disease (COPD) and Interstitial Lung Disease (ILD) in England, Wales and Scotland Using Clinical Practice Research Datalink (CPRD), Secure Anonymised Information Linkage (SAIL) Databank and DataLoch

  • Hatam S,
  • Scully ST,
  • Cook S,
  • Evans HT,
  • Hume A,
  • Kallis C,
  • Farr I,
  • Orton C,
  • Sheikh A,
  • Quint JK

Journal volume & issue
Vol. Volume 16
pp. 235 – 247

Abstract

Read online

Sara Hatam,1,* Sean Timothy Scully,2,* Sarah Cook,3,* Hywel T Evans,2,* Alastair Hume,4 Constantinos Kallis,3 Ian Farr,2 Chris Orton,2 Aziz Sheikh,1 Jennifer K Quint3 1Usher Institute, The University of Edinburgh, Edinburgh, UK; 2Population Data Science, Swansea University Medical School, Swansea, UK; 3School of Public Health, Imperial College London, London, UK; 4EPCC, The University of Edinburgh, Edinburgh, UK*These authors contributed equally to this workCorrespondence: Jennifer K Quint, Email [email protected]: Electronic healthcare records (EHRs) are an important resource for health research that can be used to improve patient outcomes in chronic respiratory diseases. However, consistent approaches in the analysis of these datasets are needed for coherent messaging, and when undertaking comparative studies across different populations.Methods and Results: We developed a harmonised curation approach to generate comparable patient cohorts for asthma, chronic obstructive pulmonary disease (COPD) and interstitial lung disease (ILD) using datasets from within Clinical Practice Research Datalink (CPRD; for England), Secure Anonymised Information Linkage (SAIL; for Wales) and DataLoch (for Scotland) by defining commonly derived variables consistently between the datasets. By working in parallel on the curation methodology used for CPRD, SAIL and DataLoch for asthma, COPD and ILD, we were able to highlight key differences in coding and recording between the databases and identify solutions to enable valid comparisons.Conclusion: Codelists and metadata generated have been made available to help re-create the asthma, COPD and ILD cohorts in CPRD, SAIL and DataLoch for different time periods, and provide a starting point for the curation of respiratory datasets in other EHR databases, expediting further comparable respiratory research.Keywords: COPD, asthma, ILD, HER, harmonisation, data curation

Keywords