A knowledge empowered explainable gene ontology fingerprint approach to improve gene functional explication and prediction
Ying Wang,
Hui Zong,
Fan Yang,
Yuantao Tong,
Yujia Xie,
Zeyu Zhang,
Honglian Huang,
Rongbin Zheng,
Shuangkuai Wang,
Danqi Huang,
Fanglin Tan,
Shiyang Cheng,
M. James C. Crabbe,
Xiaoyan Zhang
Affiliations
Ying Wang
Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; Department of Laboratory Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai 200438, China
Hui Zong
Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; Institutes for Systems Genetics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu 610041, China
Fan Yang
State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, the First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
Yuantao Tong
Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
Yujia Xie
Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
Zeyu Zhang
Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
Honglian Huang
Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
Rongbin Zheng
Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
Shuangkuai Wang
Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
Danqi Huang
Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
Fanglin Tan
Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
Shiyang Cheng
Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
M. James C. Crabbe
Wolfson College, Oxford University, Oxford OX2 6UD, UK; Institute of Biomedical and Environmental Science & Technology, University of Bedfordshire, Luton LU1 3JU, UK; School of Life Sciences, Shanxi University, Taiyuan 030006, China
Xiaoyan Zhang
Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; Corresponding author
Summary: Functional explication of genes is of great scientific value. However, conventional methods have challenges for those genes that may affect biological processes but are not annotated in public databases. Here, we developed a novel explainable gene ontology fingerprint (XGOF) method to automatically produce knowledge networks on biomedical literature in a given field which quantitatively characterizes the association between genes and ontologies. XGOF provides systematic knowledge for the potential function of genes and ontologically compares similarities and discrepancies in different disease-XGOFs integrating omics data. More importantly, XGOF can not only help to infer major cellular components in a disease microenvironment but also reveal novel gene panels or functions for in-depth experimental research where few explicit connections to diseases have previously been described in the literature. The reliability of XGOF is validated in four application scenarios, indicating a unique perspective of integrating text and data mining, with the potential to accelerate scientific discovery.