Natural language processing to identify lupus nephritis phenotype in electronic health records

Yu Deng; Jennifer A. Pacheco; Anika Ghosh; Anh Chung; Chengsheng Mao; Joshua C. Smith; Juan Zhao; Wei-Qi Wei; April Barnado; Chad Dorn; Chunhua Weng; Cong Liu; Adam Cordon; Jingzhi Yu; Yacob Tedla; Abel Kho; Rosalind Ramsey-Goldman; Theresa Walunas; Yuan Luo

doi:10.1186/s12911-024-02420-7

BMC Medical Informatics and Decision Making (Mar 2024)

Natural language processing to identify lupus nephritis phenotype in electronic health records

Yu Deng,
Jennifer A. Pacheco,
Anika Ghosh,
Anh Chung,
Chengsheng Mao,
Joshua C. Smith,
Juan Zhao,
Wei-Qi Wei,
April Barnado,
Chad Dorn,
Chunhua Weng,
Cong Liu,
Adam Cordon,
Jingzhi Yu,
Yacob Tedla,
Abel Kho,
Rosalind Ramsey-Goldman,
Theresa Walunas,
Yuan Luo

Affiliations

Yu Deng: Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University
Jennifer A. Pacheco: Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University
Anika Ghosh: Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University
Anh Chung: Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University
Chengsheng Mao: Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University
Joshua C. Smith: Department of Biomedical Informatics, Vanderbilt University Medical Center
Juan Zhao: Department of Biomedical Informatics, Vanderbilt University Medical Center
Wei-Qi Wei: Department of Biomedical Informatics, Vanderbilt University Medical Center
April Barnado: Department of Medicine, Vanderbilt University Medical Center
Chad Dorn: Department of Biomedical Informatics, Vanderbilt University Medical Center
Chunhua Weng: Department of Biomedical Informatics, Columbia University
Cong Liu: Department of Biomedical Informatics, Columbia University
Adam Cordon: Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University
Jingzhi Yu: Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University
Yacob Tedla: Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University
Abel Kho: Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University
Rosalind Ramsey-Goldman: Department of Medicine/Rheumatology, Feinberg School of Medicine, Northwestern University
Theresa Walunas: Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University
Yuan Luo: Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University

DOI: https://doi.org/10.1186/s12911-024-02420-7
Journal volume & issue: Vol. 22, no. S2
pp. 1 – 11

Abstract

Read online

Abstract Background Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW). Methods We developed five algorithms: a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC). Results Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm. Conclusion Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies.

Published in BMC Medical Informatics and Decision Making

ISSN: 1472-6947 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://bmcmedinformdecismak.biomedcentral.com

About the journal

Abstract

Keywords