Journal of Clinical and Translational Science (Apr 2024)

11 Novel Systematic Method for Identifying Congenital Anomaly Cases in Electronic Health Record Databases

  • Elly Brokamp,
  • Lisa Bastarache,
  • Nancy Cox,
  • Rizwan Hamid,
  • Nikhil K. Khanakari,
  • Gillian Hooker,
  • Megan Shuey

DOI
https://doi.org/10.1017/cts.2024.32
Journal volume & issue
Vol. 8
pp. 3 – 3

Abstract

Read online

OBJECTIVES/GOALS: Congenital anomalies (CAs) affect 3% of live births, yet the cause of 80% of CAs is unknown and for the 20% with an identified cause, variability in penetrance suggests additional risk drivers exist. Our method for identifying and categorizing CAs in electronic health record (EHR) linked biobank databases can expand and improve CA etiologic research. METHODS/STUDY POPULATION: We identified individuals with CAs in three groups: 1. Those with at least one CA 2. Those with multiple CAs (MCA), those with two or more ‘major’ CAs, and 3. Those with CAs in a specific organ system. We also created a novel quantitative approach, using phenome-wide association studies (pheWAS), for determining CA-associated genetic disease billing codes in order to separate individuals that have a known genetic cause for their CAs from those with idiopathic CAs. We updated CA phecodes, aggregates of clinical billing codes, which we used to identify CA cases in Vanderbilt’s EHR-linked biobank database, BioVU. We create a new phecode, ‘All CAs’, for researchers to quickly identify all individuals with at least one CA. We evaluate the definition of MCA using pheWAS analyses to compare ‘minor’ vs ‘major’ CA. RESULTS/ANTICIPATED RESULTS: The new CA phecode nomenclature includes 5.8 times more codes for CAs compared with the previous version (365 vs 56), improving granularity. 85 (19.7%) CA-associated genetic disease billing codes were identified through literature review. PheWAS analyses revealed an additional 16 (3.7%) genetic disease billing codes with one or more significant (p< 2.75 x10-5) association with CA-related phecodes. Identifying CA-associated genetic disease billing codes allows researchers to differentiate between idiopathic CAs and those that have a known genetic cause. PheWAS analyses of individuals with previously considered “minor” CAs showed many associated severe health problems, revealing that the differentiation between “minor” vs “major” CAs when identifying individuals with MCA in the EHR is arbitrary. DISCUSSION/SIGNIFICANCE: Our CA identification method is scalable for the growing number of EHR-linked biobanks. Differentiating between idiopathic CAs from those with known causes will increase power in studies discovering additional genetic drivers of CAs. Our novel method allows for expansion and acceleration of CA epidemiological research in EHR-linked biobank data.