Mining a stroke knowledge graph from literature

Xi Yang; Chengkun Wu; Goran Nenadic; Wei Wang; Kai Lu

doi:10.1186/s12859-021-04292-4

BMC Bioinformatics (Jul 2021)

Mining a stroke knowledge graph from literature

Xi Yang,
Chengkun Wu,
Goran Nenadic,
Wei Wang,
Kai Lu

Affiliations

Xi Yang: College of Computer, National University of Defence Technology
Chengkun Wu: State Key Laboratory of High-Performance Computing, National University of Defence Technology
Goran Nenadic: Department of Computer Science, University of Manchester
Wei Wang: College of Computer, National University of Defence Technology
Kai Lu: College of Computer, National University of Defence Technology

DOI: https://doi.org/10.1186/s12859-021-04292-4
Journal volume & issue: Vol. 22, no. S10
pp. 1 – 19

Abstract

Read online

Abstract Background Stroke has an acute onset and a high mortality rate, making it one of the most fatal diseases worldwide. Its underlying biology and treatments have been widely studied both in the “Western” biomedicine and the Traditional Chinese Medicine (TCM). However, these two approaches are often studied and reported in insolation, both in the literature and associated databases. Results To aid research in finding effective prevention methods and treatments, we integrated knowledge from the literature and a number of databases (e.g. CID, TCMID, ETCM). We employed a suite of biomedical text mining (i.e. named-entity) approaches to identify mentions of genes, diseases, drugs, chemicals, symptoms, Chinese herbs and patent medicines, etc. in a large set of stroke papers from both biomedical and TCM domains. Then, using a combination of a rule-based approach with a pre-trained BioBERT model, we extracted and classified links and relationships among stroke-related entities as expressed in the literature. We construct StrokeKG, a knowledge graph includes almost 46 k nodes of nine types, and 157 k links of 30 types, connecting diseases, genes, symptoms, drugs, pathways, herbs, chemical, ingredients and patent medicine. Conclusions Our Stroke-KG can provide practical and reliable stroke-related knowledge to help with stroke-related research like exploring new directions for stroke research and ideas for drug repurposing and discovery. We make StrokeKG freely available at http://114.115.208.144:7474/browser/ (Please click "Connect" directly) and the source structured data for stroke at https://github.com/yangxi1016/Stroke

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords