Computational and Structural Biotechnology Journal (Dec 2024)

BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature

  • Henning Schäfer,
  • Ahmad Idrissi-Yaghir,
  • Kamyar Arzideh,
  • Hendrik Damm,
  • Tabea M.G. Pakull,
  • Cynthia S. Schmidt,
  • Mikel Bahn,
  • Georg Lodde,
  • Elisabeth Livingstone,
  • Dirk Schadendorf,
  • Felix Nensa,
  • Peter A. Horn,
  • Christoph M. Friedrich

Journal volume & issue
Vol. 24
pp. 639 – 660

Abstract

Read online

Background The growth of biomedical literature presents challenges in extracting and structuring knowledge. Knowledge Graphs (KGs) offer a solution by representing relationships between biomedical entities. However, manual construction of KGs is labor-intensive and time-consuming, highlighting the need for automated methods. This work introduces BioKGrapher, a tool for automatic KG construction using large-scale publication data, with a focus on biomedical concepts related to specific medical conditions. BioKGrapher allows researchers to construct KGs from PubMed IDs.Methods The BioKGrapher pipeline begins with Named Entity Recognition and Linking (NER+NEL) to extract and normalize biomedical concepts from PubMed, mapping them to the Unified Medical Language System (UMLS). Extracted concepts are weighted and re-ranked using Kullback-Leibler divergence and local frequency balancing. These concepts are then integrated into hierarchical KGs, with relationships formed using terminologies like SNOMED CT and NCIt. Downstream applications include multi-label document classification using Adapter-infused Transformer models.Results BioKGrapher effectively aligns generated concepts with clinical practice guidelines from the German Guideline Program in Oncology (GGPO), achieving F1-Scores of up to 0.6. In multi-label classification, Adapter-infused models using a BioKGrapher cancer-specific KG improved micro F1-Scores by up to 0.89 percentage points over a non-specific KG and 2.16 points over base models across three BERT variants. The drug-disease extraction case study identified indications for Nivolumab and Rituximab.Conclusion BioKGrapher is a tool for automatic KG construction, aligning with the GGPO and enhancing downstream task performance. It offers a scalable solution for managing biomedical knowledge, with potential applications in literature recommendation, decision support, and drug repurposing.

Keywords