Journal of Medical Internet Research (Nov 2021)

Using Artificial Intelligence With Natural Language Processing to Combine Electronic Health Record’s Structured and Free Text Data to Identify Nonvalvular Atrial Fibrillation to Decrease Strokes and Death: Evaluation and Case-Control Study

  • Peter L Elkin,
  • Sarah Mullin,
  • Jack Mardekian,
  • Christopher Crowner,
  • Sylvester Sakilay,
  • Shyamashree Sinha,
  • Gary Brady,
  • Marcia Wright,
  • Kimberly Nolen,
  • JoAnn Trainer,
  • Ross Koppel,
  • Daniel Schlegel,
  • Sashank Kaushik,
  • Jane Zhao,
  • Buer Song,
  • Edwin Anand

DOI
https://doi.org/10.2196/28946
Journal volume & issue
Vol. 23, no. 11
p. e28946

Abstract

Read online

BackgroundNonvalvular atrial fibrillation (NVAF) affects almost 6 million Americans and is a major contributor to stroke but is significantly undiagnosed and undertreated despite explicit guidelines for oral anticoagulation. ObjectiveThe aim of this study is to investigate whether the use of semisupervised natural language processing (NLP) of electronic health record’s (EHR) free-text information combined with structured EHR data improves NVAF discovery and treatment and perhaps offers a method to prevent thousands of deaths and save billions of dollars. MethodsWe abstracted 96,681 participants from the University of Buffalo faculty practice’s EHR. NLP was used to index the notes and compare the ability to identify NVAF, congestive heart failure, hypertension, age ≥75 years, diabetes mellitus, stroke or transient ischemic attack, vascular disease, age 65 to 74 years, sex category (CHA2DS2-VASc), and Hypertension, Abnormal liver/renal function, Stroke history, Bleeding history or predisposition, Labile INR, Elderly, Drug/alcohol usage (HAS-BLED) scores using unstructured data (International Classification of Diseases codes) versus structured and unstructured data from clinical notes. In addition, we analyzed data from 63,296,120 participants in the Optum and Truven databases to determine the NVAF frequency, rates of CHA2DS2‑VASc ≥2, and no contraindications to oral anticoagulants, rates of stroke and death in the untreated population, and first year’s costs after stroke. ResultsThe structured-plus-unstructured method would have identified 3,976,056 additional true NVAF cases (PUS $13.5 billion. ConclusionsArtificial intelligence–informed bio-surveillance combining NLP of free-text information with structured EHR data improves data completeness, prevents thousands of strokes, and saves lives and funds. This method is applicable to many disorders with profound public health consequences.