iScience (Jun 2022)
A graph-embedded topic model enables characterization of diverse pain phenotypes among UK biobank individuals
Abstract
Summary: Large biobank repositories of clinical conditions and medications data open opportunities to investigate the phenotypic disease network. We present a graph embedded topic model (GETM). We integrate existing biomedical knowledge graph information in the form of pre-trained graph embedding into the embedded topic model. Via a variational autoencoder framework, we infer patient phenotypic mixture by modeling multi-modal discrete patient medical records. We applied GETM to UK Biobank (UKB) self-reported clinical phenotype data, which contains 443 self-reported medical conditions and 802 medications for 457,461 individuals. Compared to existing methods, GETM demonstrates good imputation performance. With a more focused application on characterizing pain phenotypes, we observe that GETM-inferred phenotypes not only accurately predict the status of chronic musculoskeletal (CMK) pain but also reveal known pain-related topics. Intriguingly, medications and conditions in the cardiovascular category are enriched among the most predictive topics of chronic pain.