Cardiovascular Digital Health Journal (Oct 2021)

Natural language processing of implantable cardioverter-defibrillator reports in hypertrophic cardiomyopathy: A paradigm for longitudinal device follow-up

  • Konstantinos C. Siontis, MD,
  • Huzefa Bhopalwala, MD,
  • Nakeya Dewaswala, MD,
  • Christopher G. Scott, MS,
  • Peter A. Noseworthy, MD, FHRS,
  • Jeffrey B. Geske, MD,
  • Steve R. Ommen, MD,
  • Rick A. Nishimura, MD,
  • Michael J. Ackerman, MD, PhD,
  • Paul A. Friedman, MD, FHRS,
  • Adelaide M. Arruda-Olson, MD, PhD

Journal volume & issue
Vol. 2, no. 5
pp. 264 – 269

Abstract

Read online

Background: The follow-up of implantable cardioverter-defibrillators (ICDs) generates large amounts of valuable structured and unstructured data embedded in device interrogation reports. Objective: We aimed to build a natural language processing (NLP) model for automated capture of ICD-recorded events from device interrogation reports using a single-center cohort of patients with hypertrophic cardiomyopathy (HCM). Methods: A total of 687 ICD interrogation reports from 247 HCM patients were included. Using a derivation set of 480 reports, we developed a rule-based NLP algorithm based on unstructured (free-text) data from the interpretation field of the ICD reports to identify sustained atrial and ventricular arrhythmias, and ICD therapies. A separate model based on structured numerical tabulated data was also developed. Both models were tested in a separate set of the 207 remaining ICD reports. Diagnostic performance was determined in reference to arrhythmia and ICD therapy annotations generated by expert manual review of the same reports. Results: The NLP system achieved sensitivity 0.98 and 0.99, and F1-scores 0.98 and 0.92 for arrhythmia and ICD therapy events, respectively. In contrast, the performance of the structured data model was significantly lower with sensitivity 0.33 and 0.76, and F1-scores 0.45 and 0.78, for arrhythmia and ICD therapy events, respectively. Conclusion: An automated NLP system can capture arrhythmia events and ICD therapies from unstructured device interrogation reports with high accuracy in HCM. These findings demonstrate the feasibility of an NLP paradigm for the extraction of data for clinical care and research from ICD reports embedded in the electronic health record.

Keywords