PLoS ONE (Jan 2022)

Developing a common data model approach for DISCOVER CKD: A retrospective, global cohort of real-world patients with chronic kidney disease.

  • Supriya Kumar,
  • Matthew Arnold,
  • Glen James,
  • Rema Padman

DOI
https://doi.org/10.1371/journal.pone.0274131
Journal volume & issue
Vol. 17, no. 9
p. e0274131

Abstract

Read online

ObjectivesTo describe a flexible common data model (CDM) approach that can be efficiently tailored to study-specific needs to facilitate pooled patient-level analysis and aggregated/meta-analysis of routinely collected retrospective patient data from disparate data sources; and to detail the application of this CDM approach to the DISCOVER CKD retrospective cohort, a longitudinal database of routinely collected (secondary) patient data of individuals with chronic kidney disease (CKD).MethodsThe flexible CDM approach incorporated three independent, exchangeable components that preceded data mapping and data model implementation: (1) standardized code lists (unifying medical events from different coding systems); (2) laboratory unit harmonization tables; and (3) base cohort definitions. Events between different coding vocabularies were not mapped code-to-code; for each data source, code lists of labels were curated at the entity/event level. A study team of epidemiologists, clinicians, informaticists, and data scientists were included within the validation of each component.ResultsApplying the CDM to the DISCOVER CKD retrospective cohort, secondary data from 1,857,593 patients with CKD were harmonized from five data sources, across three countries, into a discrete database for rapid real-world evidence generation.ConclusionsThis flexible CDM approach facilitates evidence generation from real-world data within the DISCOVER CKD retrospective cohort, providing novel insights into the epidemiology of CKD that may expedite improvements in diagnosis, prognosis, early intervention, and disease management. The adaptable architecture of this CDM approach ensures scalable, fast, and efficient application within other therapy areas to facilitate the combined analysis of different types of secondary data from multiple, heterogeneous sources.