MKG-GC: A multi-task learning-based knowledge graph construction framework with personalized application to gastric cancer

Yang Yang; Yuwei Lu; Zixuan Zheng; Hao Wu; Yuxin Lin; Fuliang Qian; Wenying Yan

Computational and Structural Biotechnology Journal (Dec 2024)

MKG-GC: A multi-task learning-based knowledge graph construction framework with personalized application to gastric cancer

Yang Yang,
Yuwei Lu,
Zixuan Zheng,
Hao Wu,
Yuxin Lin,
Fuliang Qian,
Wenying Yan

Affiliations

Yang Yang: Computing Science and Artificial Intelligence College, Suzhou City University, Suzhou 215004, China; School of Computer Science & Technology, Soochow University, Suzhou 215000, China
Yuwei Lu: School of Computer Science & Technology, Soochow University, Suzhou 215000, China
Zixuan Zheng: School of Computer Science & Technology, Soochow University, Suzhou 215000, China
Hao Wu: Department of Bioinformatics, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou 215123, China
Yuxin Lin: Center for Systems Biology, Soochow University, Suzhou 215123, China; Department of Urology, the First Affiliated Hospital of Soochow University, Suzhou 215000, China
Fuliang Qian: Center for Systems Biology, Soochow University, Suzhou 215123, China; Medical Center of Soochow University, Suzhou 215123, China; Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou 215123, China; Corresponding author at: Center for Systems Biology, Soochow University, Suzhou 215123, China.
Wenying Yan: Department of Bioinformatics, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou 215123, China; Center for Systems Biology, Soochow University, Suzhou 215123, China; Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou 215123, China; Corresponding author at: Department of Bioinformatics, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou 215123, China.

Journal volume & issue: Vol. 23
pp. 1339 – 1347

Abstract

Read online

Over the past decade, information for precision disease medicine has accumulated in the form of textual data. To effectively utilize this expanding medical text, we proposed a multi-task learning-based framework based on hard parameter sharing for knowledge graph construction (MKG), and then used it to automatically extract gastric cancer (GC)-related biomedical knowledge from the literature and identify GC drug candidates. In MKG, we designed three separate modules, MT-BGIPN, MT-SGTF and MT-ScBERT, for entity recognition, entity normalization, and relation classification, respectively. To address the challenges posed by the long and irregular naming of medical entities, the MT-BGIPN utilized bidirectional gated recurrent unit and interactive pointer network techniques, significantly improving entity recognition accuracy to an average F1 value of 84.5% across datasets. In MT-SGTF, we employed the term frequency-inverse document frequency and the gated attention unit. These combine both semantic and characteristic features of entities, resulting in an average Hits@ 1 score of 94.5% across five datasets. The MT-ScBERT integrated cross-text, entity, and context features, yielding an average F1 value of 86.9% across 11 relation classification datasets. Based on the MKG, we then developed a specific knowledge graph for GC (MKG-GC), which encompasses a total of 9129 entities and 88,482 triplets. Lastly, the MKG-GC was used to predict potential GC drugs using a pre-trained language model called BioKGE-BERT and a drug-disease discriminant model based on CNN-BiLSTM. Remarkably, nine out of the top ten predicted drugs have been previously reported as effective for gastric cancer treatment. Finally, an online platform was created for exploration and visualization of MKG-GC at https://www.yanglab-mi.org.cn/MKG-GC/.

Published in Computational and Structural Biotechnology Journal

ISSN: 2001-0370 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.journals.elsevier.com/computational-and-structural-biotechnology-journal

About the journal

Abstract

Keywords