Rules-based natural language processing to extract features of large vessel occlusion and cerebral edema from radiology reports in stroke patients

Zohair Siddiqui; Kunal Bhatia; Aaron Corbin; Rajat Dhar

Neuroscience Informatics (Jun 2023)

Rules-based natural language processing to extract features of large vessel occlusion and cerebral edema from radiology reports in stroke patients

Zohair Siddiqui,
Kunal Bhatia,
Aaron Corbin,
Rajat Dhar

Affiliations

Zohair Siddiqui: Saint Louis University, School of Medicine, St. Louis, MO, United States of America; Corresponding author. Address: 660 S. Euclid Ave., Campus Box 8111, St. Louis, MO 63110, United States of America.
Kunal Bhatia: Department of Neurology, Washington University in St. Louis, School of Medicine, St. Louis, MO, United States of America
Aaron Corbin: Saint Louis University, School of Medicine, St. Louis, MO, United States of America
Rajat Dhar: Department of Neurology, Washington University in St. Louis, School of Medicine, St. Louis, MO, United States of America

Journal volume & issue: Vol. 3, no. 2
p. 100129

Abstract

Read online

Background: Large vessel occlusion (LVO) stroke research is limited regarding high-risk patient groups for complications including cerebral edema. Large, well-phenotyped cohorts hold potential insights, but identifying cohorts and manually extracting outcomes is impractical. Natural language processing (NLP) software has previously extracted stroke characteristics from radiology reports, but there has not been an integrated extraction of both LVO classification and acute stroke outcomes. Methods: We constructed a rules-based NLP pipeline that extracted presence/location of arterial occlusion and core/penumbral volumes from multimodal CT reports, along with presence of edema and midline shift on follow-up CTs. The algorithm flagged inconsistent reports for manual adjudication. We validated performance over two cohorts and analyzed the associations between NLP-extracted variables and clinical edema outcomes. Results: The algorithm identified occlusions in the development (n=577) and test cohorts (n=442) with 94% and 85% recall, increasing to 97% and 93% after review of flagged reports. It could distinguish proximal ICA/M1 from distal occlusions with 96% recall and correctly extracted 98% of core/penumbral volumes. NLP recall was 93% and 86% for identifying edema and midline shift from follow-up reports of 213 patients with ICA/MCA occlusions. NLP-extracted radiographic edema captured 89% of those who developed clinical cerebral edema, which was more likely in those with NLP-identified proximal vs distal occlusions and associated with significantly higher core/penumbral volumes. Conclusion: A rules-based NLP pipeline can accurately identify and phenotype an LVO cohort, yielding clinical associations with stroke research implications.

Published in Neuroscience Informatics

ISSN: 2772-5286 (Online)
Publisher: Elsevier
Country of publisher: France
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: https://www.journals.elsevier.com/neuroscience-informatics

About the journal

Abstract

Keywords