Information Extraction from German Clinical Care Documents in Context of Alzheimer’s Disease

Lisa Langnickel; Kilian Krockauer; Mischa Uebachs; Sebastian Schaaf; Sumit Madan; Thomas Klockgether; Juliane Fluck

doi:10.3390/app112210717

Applied Sciences (Nov 2021)

Information Extraction from German Clinical Care Documents in Context of Alzheimer’s Disease

Lisa Langnickel,
Kilian Krockauer,
Mischa Uebachs,
Sebastian Schaaf,
Sumit Madan,
Thomas Klockgether,
Juliane Fluck

Affiliations

Lisa Langnickel: Knowledge Management, ZB MED—Information Centre for Life Sciences, 50931 Cologne, Germany
Kilian Krockauer: IT Department, University Hospital Bonn, 53127 Bonn, Germany
Mischa Uebachs: Department of Neurology, DRK Kamillus Klinik Asbach, 53567 Asbach, Germany
Sebastian Schaaf: HPC and Scientific Computing, German Center for Neurodegenerative Diseases (DZNE) within the Helmholtz Association, 53127 Bonn, Germany
Sumit Madan: Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing SCAI, 53757 Sankt Augustin, Germany
Thomas Klockgether: Department of Neurology, University Hospital Bonn, 53127 Bonn, Germany
Juliane Fluck: Knowledge Management, ZB MED—Information Centre for Life Sciences, 50931 Cologne, Germany

DOI: https://doi.org/10.3390/app112210717
Journal volume & issue: Vol. 11, no. 22
p. 10717

Abstract

Read online

Dementia affects approximately 50 million people in the world today, the majority suffering from Alzheimer’s disease (AD). The availability of long-term patient data is one of the most important prerequisites for a better understanding of diseases. Worldwide, many prospective, longitudinal cohort studies have been initiated to understand AD. However, this approach takes years to enroll and follow up with a substantial number of patients, resulting in a current lack of data. This raises the question of whether clinical routine datasets could be utilized to extend collected registry data. It is, therefore, necessary to assess what kind of information is available in memory clinic routine databases. We did exactly this based on the example of the University Hospital Bonn. Whereas a number of data items are available in machine readable formats, additional valuable information is stored in textual documents. The extraction of information from such documents is only applicable via text mining methods. Therefore, we set up modular, rule-based text mining workflows requiring minimal sets of training data. The system achieves F1-scores over 95% for the most relevant classes, i.e., memory disturbances from medical reports and quantitative scores from semi-structured neuropsychological test protocols. Thus, we created a machine-readable core dataset for over 8000 patient visits over a ten-year period.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords