Time-sensitive clinical concept embeddings learned from large electronic health records

Yang Xiang; Jun Xu; Yuqi Si; Zhiheng Li; Laila Rasmy; Yujia Zhou; Firat Tiryaki; Fang Li; Yaoyun Zhang; Yonghui Wu; Xiaoqian Jiang; Wenjin Jim Zheng; Degui Zhi; Cui Tao; Hua Xu

doi:10.1186/s12911-019-0766-3

BMC Medical Informatics and Decision Making (Apr 2019)

Time-sensitive clinical concept embeddings learned from large electronic health records

Yang Xiang,
Jun Xu,
Yuqi Si,
Zhiheng Li,
Laila Rasmy,
Yujia Zhou,
Firat Tiryaki,
Fang Li,
Yaoyun Zhang,
Yonghui Wu,
Xiaoqian Jiang,
Wenjin Jim Zheng,
Degui Zhi,
Cui Tao,
Hua Xu

Affiliations

Yang Xiang: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Jun Xu: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Yuqi Si: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Zhiheng Li: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Laila Rasmy: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Yujia Zhou: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Firat Tiryaki: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Fang Li: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Yaoyun Zhang: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Yonghui Wu: Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida
Xiaoqian Jiang: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Wenjin Jim Zheng: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Degui Zhi: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Cui Tao: School of Biomedical Informatics, The University of Texas Health Science Center at Houston
Hua Xu: School of Biomedical Informatics, The University of Texas Health Science Center at Houston

DOI: https://doi.org/10.1186/s12911-019-0766-3
Journal volume & issue: Vol. 19, no. S2
pp. 139 – 148

Abstract

Read online

Abstract Background Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient’s records, which may lead to incorrect selection of contexts. Methods To address this issue, we extended three popular concept embedding learning methods: word2vec, positive pointwise mutual information (PPMI) and FastText, to consider time-sensitive information. We then trained them on a large electronic health records (EHR) database containing about 50 million patients to generate concept embeddings and evaluated them for both intrinsic evaluations focusing on concept similarity measure and an extrinsic evaluation to assess the use of generated concept embeddings in the task of predicting disease onset. Results Our experiments show that embeddings learned from information within one visit (time window zero) improve performance on the concept similarity measure and the FastText algorithm usually had better performance than the other two algorithms. For the predictive modeling task, the optimal result was achieved by word2vec embeddings with a 30-day sliding window. Conclusions Considering time constraints are important in training clinical concept embeddings. We expect they can benefit a series of downstream applications.

Published in BMC Medical Informatics and Decision Making

ISSN: 1472-6947 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://bmcmedinformdecismak.biomedcentral.com

About the journal

Abstract

Keywords