Applied Sciences (Nov 2021)
Information Extraction from German Clinical Care Documents in Context of Alzheimer’s Disease
Abstract
Dementia affects approximately 50 million people in the world today, the majority suffering from Alzheimer’s disease (AD). The availability of long-term patient data is one of the most important prerequisites for a better understanding of diseases. Worldwide, many prospective, longitudinal cohort studies have been initiated to understand AD. However, this approach takes years to enroll and follow up with a substantial number of patients, resulting in a current lack of data. This raises the question of whether clinical routine datasets could be utilized to extend collected registry data. It is, therefore, necessary to assess what kind of information is available in memory clinic routine databases. We did exactly this based on the example of the University Hospital Bonn. Whereas a number of data items are available in machine readable formats, additional valuable information is stored in textual documents. The extraction of information from such documents is only applicable via text mining methods. Therefore, we set up modular, rule-based text mining workflows requiring minimal sets of training data. The system achieves F1-scores over 95% for the most relevant classes, i.e., memory disturbances from medical reports and quantitative scores from semi-structured neuropsychological test protocols. Thus, we created a machine-readable core dataset for over 8000 patient visits over a ten-year period.
Keywords