Nature Communications (Nov 2023)

TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records

  • Zhichao Yang,
  • Avijit Mitra,
  • Weisong Liu,
  • Dan Berlowitz,
  • Hong Yu

DOI
https://doi.org/10.1038/s41467-023-43715-z
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Deep learning transformer-based models using longitudinal electronic health records (EHRs) have shown a great success in prediction of clinical diseases or outcomes. Pretraining on a large dataset can help such models map the input space better and boost their performance on relevant tasks through finetuning with limited data. In this study, we present TransformEHR, a generative encoder-decoder model with transformer that is pretrained using a new pretraining objective—predicting all diseases and outcomes of a patient at a future visit from previous visits. TransformEHR’s encoder-decoder framework, paired with the novel pretraining objective, helps it achieve the new state-of-the-art performance on multiple clinical prediction tasks. Comparing with the previous model, TransformEHR improves area under the precision–recall curve by 2% (p < 0.001) for pancreatic cancer onset and by 24% (p = 0.007) for intentional self-harm in patients with post-traumatic stress disorder. The high performance in predicting intentional self-harm shows the potential of TransformEHR in building effective clinical intervention systems. TransformEHR is also generalizable and can be easily finetuned for clinical prediction tasks with limited data.