Generic medical concept embedding and time decay for diverse patient outcome prediction tasks

Yupeng Li; Wei Dong; Boshu Ru; Adam Black; Xinyuan Zhang; Yuanfang Guan

iScience (Sep 2022)

Generic medical concept embedding and time decay for diverse patient outcome prediction tasks

Yupeng Li,
Wei Dong,
Boshu Ru,
Adam Black,
Xinyuan Zhang,
Yuanfang Guan

Affiliations

Yupeng Li: Merck & Co., Inc., Rahway, NJ, USA
Wei Dong: Ann Arbor Algorithms Inc., Ann Arbor, MI 48104, USA
Boshu Ru: Merck & Co., Inc., Rahway, NJ, USA; Corresponding author
Adam Black: Odysseus Data Services, Cambridge, MA 02142, USA
Xinyuan Zhang: Ann Arbor Algorithms Inc., Ann Arbor, MI 48104, USA
Yuanfang Guan: Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Corresponding author

Journal volume & issue: Vol. 25, no. 9
p. 104880

Abstract

Read online

Summary: Many fields, including Natural Language Processing (NLP), have recently witnessed the benefit of pre-training with large generic datasets to improve the accuracy of prediction tasks. However, there exist key differences between the longitudinal healthcare data (e.g., claims) and NLP tasks, which make the direct application of NLP pre-training methods to healthcare data inappropriate. In this article, we developed a pre-training scheme for longitudinal healthcare data that leverages the pairing of medical history and a future event. We then conducted systematic evaluations of various methods on ten patient-level prediction tasks encompassing adverse events, misdiagnosis, disease risks, and readmission. In addition to substantially reducing model size, our results show that a universal medical concept embedding pretrained with generic big data as well as carefully designed time decay modeling improves the accuracy of different downstream prediction tasks.

Published in iScience

ISSN: 2589-0042 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Science
Website: http://www.cell.com/iscience/home

About the journal

Abstract

Keywords