PLoS Computational Biology (Aug 2011)

Using electronic patient records to discover disease correlations and stratify patient cohorts.

  • Francisco S Roque,
  • Peter B Jensen,
  • Henriette Schmock,
  • Marlene Dalgaard,
  • Massimo Andreatta,
  • Thomas Hansen,
  • Karen Søeby,
  • Søren Bredkjær,
  • Anders Juul,
  • Thomas Werge,
  • Lars J Jensen,
  • Søren Brunak

DOI
https://doi.org/10.1371/journal.pcbi.1002141
Journal volume & issue
Vol. 7, no. 8
p. e1002141

Abstract

Read online

Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently can be mapped to systems biology frameworks.