Neuroscience Informatics (Jun 2023)
Rules-based natural language processing to extract features of large vessel occlusion and cerebral edema from radiology reports in stroke patients
Abstract
Background: Large vessel occlusion (LVO) stroke research is limited regarding high-risk patient groups for complications including cerebral edema. Large, well-phenotyped cohorts hold potential insights, but identifying cohorts and manually extracting outcomes is impractical. Natural language processing (NLP) software has previously extracted stroke characteristics from radiology reports, but there has not been an integrated extraction of both LVO classification and acute stroke outcomes. Methods: We constructed a rules-based NLP pipeline that extracted presence/location of arterial occlusion and core/penumbral volumes from multimodal CT reports, along with presence of edema and midline shift on follow-up CTs. The algorithm flagged inconsistent reports for manual adjudication. We validated performance over two cohorts and analyzed the associations between NLP-extracted variables and clinical edema outcomes. Results: The algorithm identified occlusions in the development (n=577) and test cohorts (n=442) with 94% and 85% recall, increasing to 97% and 93% after review of flagged reports. It could distinguish proximal ICA/M1 from distal occlusions with 96% recall and correctly extracted 98% of core/penumbral volumes. NLP recall was 93% and 86% for identifying edema and midline shift from follow-up reports of 213 patients with ICA/MCA occlusions. NLP-extracted radiographic edema captured 89% of those who developed clinical cerebral edema, which was more likely in those with NLP-identified proximal vs distal occlusions and associated with significantly higher core/penumbral volumes. Conclusion: A rules-based NLP pipeline can accurately identify and phenotype an LVO cohort, yielding clinical associations with stroke research implications.