Information extraction from German radiological reports for general clinical text and language understanding

Michael Jantscher; Felix Gunzer; Roman Kern; Eva Hassler; Sebastian Tschauner; Gernot Reishofer

doi:10.1038/s41598-023-29323-3

Scientific Reports (Feb 2023)

Information extraction from German radiological reports for general clinical text and language understanding

Michael Jantscher,
Felix Gunzer,
Roman Kern,
Eva Hassler,
Sebastian Tschauner,
Gernot Reishofer

Affiliations

Michael Jantscher: Know-Center
Felix Gunzer: Division of Neuroradiology, Vascular and Interventional Radiology, Department of Radiology, Medical University Graz
Roman Kern: Know-Center
Eva Hassler: Division of Neuroradiology, Vascular and Interventional Radiology, Department of Radiology, Medical University Graz
Sebastian Tschauner: Division of Pediatric Radiology, Department of Radiology, Medical University Graz
Gernot Reishofer: Department of Radiology, Medical University Graz

DOI: https://doi.org/10.1038/s41598-023-29323-3
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. However, for low-resource languages like German, the use of modern text processing applications that require a large amount of training data proves to be difficult, as only few data sets are available mainly due to legal restrictions. In this study, we present an information extraction framework that was initially pre-trained on real-world computed tomographic (CT) reports of head examinations, followed by domain adaptive fine-tuning on reports from different imaging examinations. We show that in the pre-training phase, the semantic and contextual meaning of one clinical reporting domain can be captured and effectively transferred to foreign clinical imaging examinations. Moreover, we introduce an active learning approach with an intrinsic strategic sampling method to generate highly informative training data with low human annotation cost. We see that the model performance can be significantly improved by an appropriate selection of the data to be annotated, without the need to train the model on a specific downstream task. With a general annotation scheme that can be used not only in the radiology field but also in a broader clinical setting, we contribute to a more consistent labeling and annotation process that also facilitates the verification and evaluation of language models in the German clinical setting.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal