Scientific Reports (Jul 2024)

Identifying symptom etiologies using syntactic patterns and large language models

  • Hillel Taub-Tabib,
  • Yosi Shamay,
  • Micah Shlain,
  • Menny Pinhasov,
  • Mark Polak,
  • Aryeh Tiktinsky,
  • Sigal Rahamimov,
  • Dan Bareket,
  • Ben Eyal,
  • Moriya Kassis,
  • Yoav Goldberg,
  • Tal Kaminski Rosenberg,
  • Simon Vulfsons,
  • Maayan Ben Sasson

DOI
https://doi.org/10.1038/s41598-024-65645-6
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Differential diagnosis is a crucial aspect of medical practice, as it guides clinicians to accurate diagnoses and effective treatment plans. Traditional resources, such as medical books and services like UpToDate, are constrained by manual curation, potentially missing out on novel or less common findings. This paper introduces and analyzes two novel methods to mine etiologies from scientific literature. The first method employs a traditional Natural Language Processing (NLP) approach based on syntactic patterns. By using a novel application of human-guided pattern bootstrapping patterns are derived quickly, and symptom etiologies are extracted with significant coverage. The second method utilizes generative models, specifically GPT-4, coupled with a fact verification pipeline, marking a pioneering application of generative techniques in etiology extraction. Analyzing this second method shows that while it is highly precise, it offers lesser coverage compared to the syntactic approach. Importantly, combining both methodologies yields synergistic outcomes, enhancing the depth and reliability of etiology mining.