Artificial Intelligence in the Life Sciences (Dec 2023)

A natural language processing system for the efficient updating of highly curated pathophysiology mechanism knowledge graphs

  • Negin Sadat Babaiha,
  • Hassan Elsayed,
  • Bide Zhang,
  • Abish Kaladharan,
  • Priya Sethumadhavan,
  • Bruce Schultz,
  • Jürgen Klein,
  • Bruno Freudensprung,
  • Vanessa Lage-Rupprecht,
  • Alpha Tom Kodamullil,
  • Marc Jacobs,
  • Stefan Geissler,
  • Sumit Madan,
  • Martin Hofmann-Apitius

Journal volume & issue
Vol. 4
p. 100078

Abstract

Read online

Background: Biomedical knowledge graphs (KG) have become crucial for describing biological findings in a structured manner. To keep up with the constantly changing flow of knowledge, their embedded information must be regularly updated with the latest findings. Natural language processing (NLP) has created new possibilities for automating this upkeep by facilitating information extraction from free text. However, due to annotated and labeled biomedical data limitations, the development of completely autonomous information extraction systems remains a substantial scientific and technological hurdle. This study aims to explore methodologies best suited to support the automatic extraction of causal relationships from biomedical literature with the aim of regular and rapid updating of disease-specific pathophysiology mechanism KGs. Methods: Our proposed approach first searches and retrieves PubMed abstracts using the desired terms and keywords. The extension corpora are then passed through the NLP pipeline for automatic information extraction. We then identify triples representing cause-and-effect relationships and encode this content using the Biological Expression Language (BEL). Finally, domain experts perform an analysis of the completeness, relevance, accuracy, and novelty of the extracted triples. Results: In our test scenario, which is focused on the KG regarding the phosphorylation of the Tau protein, our pipeline successfully contributed novel data, which was then subsequently used to update the KG leading to the identification of six additional upstream regulators of Tau phosphorylation. Conclusion: Here, it is demonstrated that the NLP-based workflow we created is capable of rapidly updating pathophysiology mechanism graphs. As a result, production-scale, semi-automated updating of pre-existing, curated mechanism graphs is enabled.

Keywords