edge2vec: Representation learning using edge semantics for biomedical knowledge discovery

Zheng Gao; Gang Fu; Chunping Ouyang; Satoshi Tsutsui; Xiaozhong Liu; Jeremy Yang; Christopher Gessner; Brian Foote; David Wild; Ying Ding; Qi Yu

doi:10.1186/s12859-019-2914-2

BMC Bioinformatics (Jun 2019)

edge2vec: Representation learning using edge semantics for biomedical knowledge discovery

Zheng Gao,
Gang Fu,
Chunping Ouyang,
Satoshi Tsutsui,
Xiaozhong Liu,
Jeremy Yang,
Christopher Gessner,
Brian Foote,
David Wild,
Ying Ding,
Qi Yu

Affiliations

Zheng Gao: School of Informatics, Computing and Engineering, Indiana University
Gang Fu: Microsoft Corporation
Chunping Ouyang: University of South China
Satoshi Tsutsui: School of Informatics, Computing and Engineering, Indiana University
Xiaozhong Liu: School of Informatics, Computing and Engineering, Indiana University
Jeremy Yang: School of Informatics, Computing and Engineering, Indiana University
Christopher Gessner: School of Informatics, Computing and Engineering, Indiana University
Brian Foote: Data2Discovery, Inc.
David Wild: School of Informatics, Computing and Engineering, Indiana University
Ying Ding: School of Informatics, Computing and Engineering, Indiana University
Qi Yu: School of Management, Shanxi Medical University

DOI: https://doi.org/10.1186/s12859-019-2914-2
Journal volume & issue: Vol. 20, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Background Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. Results In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, edge2vec significantly outperforms state-of-the-art models on all three tasks. Conclusions We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords