BMC Medical Informatics and Decision Making (Apr 2019)
EHR problem list clustering for improved topic-space navigation
Abstract
Abstract Background The amount of patient-related information within clinical information systems accumulates over time, especially in cases where patients suffer from chronic diseases with many hospitalizations and consultations. The diagnosis or problem list is an important feature of the electronic health record, which provides a dynamic account of a patient’s current illness and past history. In the case of an Austrian hospital network, problem list entries are limited to fifty characters and are potentially linked to ICD-10. The requirement of producing ICD codes at each hospital stay, together with the length limitation of list items leads to highly redundant problem lists, which conflicts with the physicians’ need of getting a good overview of a patient in short time. This paper investigates a method, by which problem list items can be semantically grouped, in order to allow for fast navigation through patient-related topic spaces. Methods We applied a minimal language-dependent preprocessing strategy and mapped problem list entries as tf-idf weighted character 3-grams into a numerical vector space. Based on this representation we used the unweighted pair group method with arithmetic mean (UPGMA) clustering algorithm with cosine distances and inferred an optimal boundary in order to form semantically consistent topic spaces, taking into consideration different levels of dimensionality reduction via latent semantic analysis (LSA). Results With the proposed clustering approach, evaluated via an intra- and inter-patient scenario in combination with a natural language pipeline, we achieved an average compression rate of 80% of the initial list items forming consistent semantic topic spaces with an F-measure greater than 0.80 in both cases. The average number of identified topics in the intra-patient case (μ Intra = 78.4) was slightly lower than in the inter-patient case (μ Inter = 83.4). LSA-based feature space reduction had no significant positive performance impact in our investigations. Conclusions The investigation presented here is centered on a data-driven solution to the known problem of information overload, which causes ineffective human-computer interactions at clinicians’ work places. This problem is addressed by navigable disease topic spaces where related items are grouped and the topics can be more easily accessed.