Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment

Kirk Roberts, PhD; Aaron T. Chin, MD; Klaus Loewy, MS; Lisa Pompeii, PhD; Harold Shin, MS; Nicholas L. Rider, DO

Journal of Allergy and Clinical Immunology: Global (May 2024)

Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment

Kirk Roberts, PhD,
Aaron T. Chin, MD,
Klaus Loewy, MS,
Lisa Pompeii, PhD,
Harold Shin, MS,
Nicholas L. Rider, DO

Affiliations

Kirk Roberts, PhD: McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Tex
Aaron T. Chin, MD: Division of Immunology, Allergy, and Rheumatology, University of California, Los Angeles, Calif
Klaus Loewy, MS: Texas Children’s Hospital, Houston, Tex
Lisa Pompeii, PhD: Department of Patient Services, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio
Harold Shin, MS: College of Osteopathic Medicine, Liberty University, Lynchburg, Va
Nicholas L. Rider, DO: Division of Health System & Implementation Science, Virginia Tech Carilion School of Medicine, Roanoke, Va; Section of Allergy and Immunology, Carilion Clinic, Roanoke, Va; Corresponding author: Nicholas L. Rider, DO, Virginia Tech Carilion School of Medicine, 1 Riverside Circle, 249 Roanoke, VA 24016.

Journal volume & issue: Vol. 3, no. 2
p. 100224

Abstract

Read online

Background: There are now approximately 450 discrete inborn errors of immunity (IEI) described; however, diagnostic rates remain suboptimal. Use of structured health record data has proven useful for patient detection but may be augmented by natural language processing (NLP). Here we present a machine learning model that can distinguish patients from controls significantly in advance of ultimate diagnosis date. Objective: We sought to create an NLP machine learning algorithm that could identify IEI patients early during the disease course and shorten the diagnostic odyssey. Methods: Our approach involved extracting a large corpus of IEI patient clinical-note text from a major referral center’s electronic health record (EHR) system and a matched control corpus for comparison. We built text classifiers with simple machine learning methods and trained them on progressively longer time epochs before date of diagnosis. Results: The top performing NLP algorithm effectively distinguished cases from controls robustly 36 months before ultimate clinical diagnosis (area under precision recall curve > 0.95). Corpus analysis demonstrated that statistically enriched, IEI-relevant terms were evident 24+ months before diagnosis, validating that clinical notes can provide a signal for early prediction of IEI. Conclusion: Mining EHR notes with NLP holds promise for improving early IEI patient detection.

Published in Journal of Allergy and Clinical Immunology: Global

ISSN: 2772-8293 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Internal medicine: Specialties of internal medicine: Immunologic diseases. Allergy
Website: https://www.sciencedirect.com/journal/journal-of-allergy-and-clinical-immunology-global

About the journal

Abstract

Keywords