Natural language processing in urology: Automated extraction of clinical information from histopathology reports of uro-oncology procedures

Honghong Huang; Fiona Xin Yi Lim; Gary Tianyu Gu; Matthew Jiangchou Han; Andrew Hao Sen Fang; Elian Hui San Chia; Eileen Yen Tze Bei; Sarah Zhuling Tham; Henry Sun Sien Ho; John Shyi Peng Yuen; Aixin Sun; Jay Kheng Sit Lim

Heliyon (Apr 2023)

Natural language processing in urology: Automated extraction of clinical information from histopathology reports of uro-oncology procedures

Honghong Huang,
Fiona Xin Yi Lim,
Gary Tianyu Gu,
Matthew Jiangchou Han,
Andrew Hao Sen Fang,
Elian Hui San Chia,
Eileen Yen Tze Bei,
Sarah Zhuling Tham,
Henry Sun Sien Ho,
John Shyi Peng Yuen,
Aixin Sun,
Jay Kheng Sit Lim

Affiliations

Honghong Huang: Department of Urology, Singapore General Hospital, Singapore; Corresponding author. Academia Level 5, Department of Urology, Singapore General Hospital, 20 College Road, Singapore 169856, Singapore.
Fiona Xin Yi Lim: School of Computer Science and Engineering, Nanyang Technological University, Singapore
Gary Tianyu Gu: Department of Diagnostic Radiology, Singapore General Hospital, Singapore
Matthew Jiangchou Han: Department of Future Health System, Singapore General Hospital, Singapore
Andrew Hao Sen Fang: Doctor Anywhere, Singapore
Elian Hui San Chia: Office of Insights & Analytics, Singapore Health Services, Singapore
Eileen Yen Tze Bei: Department of Urology, Singapore General Hospital, Singapore
Sarah Zhuling Tham: Department of Urology, Singapore General Hospital, Singapore
Henry Sun Sien Ho: Department of Urology, Singapore General Hospital, Singapore
John Shyi Peng Yuen: Department of Urology, Singapore General Hospital, Singapore
Aixin Sun: School of Computer Science and Engineering, Nanyang Technological University, Singapore; Project Principal Inverstigators, Singapore
Jay Kheng Sit Lim: Department of Urology, Singapore General Hospital, Singapore; Project Principal Inverstigators, Singapore

Journal volume & issue: Vol. 9, no. 4
p. e14793

Abstract

Read online

Objectives: We aimed to automate routine extraction of clinically relevant unstructured information from uro-oncological histopathology reports by applying rule-based and machine learning (ML)/deep learning (DL) methods to develop an oncology focused natural language processing (NLP) algorithm. Methods: Our algorithm employs a combination of a rule-based approach and support vector machines/neural networks (BioBert/Clinical BERT), and is optimised for accuracy. We randomly extracted 5772 uro-oncological histology reports from 2008 to 2018 from electronic health records (EHRs) and split the data into training and validation datasets in an 80:20 ratio. The training dataset was annotated by medical professionals and reviewed by cancer registrars. The validation dataset was annotated by cancer registrars and defined as the gold standard with which the algorithm outcomes were compared. The accuracy of NLP-parsed data was matched against these human annotation results. We defined an accuracy rate of >95% as “acceptable” by professional human extraction, as per our cancer registry definition. Results: There were 11 extraction variables in 268 free-text reports. We achieved an accuracy rate of between 61.2% and 99.0% using our algorithm. Of the 11 data fields, a total of 8 data fields met the acceptable accuracy standard, while another 3 data fields had an accuracy rate between 61.2% and 89.7%. Noticeably, the rule-based approach was shown to be more effective and robust in extracting variables of interest. On the other hand, ML/DL models had poorer predictive performances due to highly imbalanced data distribution and variable writing styles between different reports and data used for domain-specific pre-trained models. Conclusion: We designed an NLP algorithm that can automate clinical information extraction accurately from histopathology reports with an overall average micro accuracy of 93.3%.

Published in Heliyon

ISSN: 2405-8440 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General); Social Sciences: Social sciences (General)
Website: https://www.cell.com/heliyon/home

About the journal

Abstract

Keywords