npj Digital Medicine (Feb 2025)

Multiple large language models versus experienced physicians in diagnosing challenging cases with gastrointestinal symptoms

  • Xintian Yang,
  • Tongxin Li,
  • Han Wang,
  • Rongchun Zhang,
  • Zhi Ni,
  • Na Liu,
  • Huihong Zhai,
  • Jianghai Zhao,
  • Fandong Meng,
  • Zhongyin Zhou,
  • Shanhong Tang,
  • Limei Wang,
  • Xiangping Wang,
  • Hui Luo,
  • Gui Ren,
  • Linhui Zhang,
  • Xiaoyu Kang,
  • Jun Wang,
  • Ning Bo,
  • Xiaoning Yang,
  • Weijie Xue,
  • Xiaoyin Zhang,
  • Ning Chen,
  • Rui Guo,
  • Baiwen Li,
  • Yajun Li,
  • Yaling Liu,
  • Tiantian Zhang,
  • Shuhui Liang,
  • Yong Lv,
  • Yongzhan Nie,
  • Daiming Fan,
  • Lina Zhao,
  • Yanglin Pan

DOI
https://doi.org/10.1038/s41746-025-01486-5
Journal volume & issue
Vol. 8, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Faced with challenging cases, doctors are increasingly seeking diagnostic advice from large language models (LLMs). This study aims to compare the ability of LLMs and human physicians to diagnose challenging cases. An offline dataset of 67 challenging cases with primary gastrointestinal symptoms was used to solicit possible diagnoses from seven LLMs and 22 gastroenterologists. The diagnoses by Claude 3.5 Sonnet covered the highest proportion (95% confidence interval [CI]) of instructive diagnoses (76.1%, [70.6%–80.9%]), significantly surpassing all the gastroenterologists (p < 0.05 for all). Claude 3.5 Sonnet achieved a significantly higher coverage rate (95% CI) than that of the gastroenterologists using search engines or other traditional resource (76.1% [70.6%–80.9%] vs. 45.5% [40.7%-50.4%], p < 0.001). The study highlights that advanced LLMs may assist gastroenterologists with instructive, time-saving, and cost-effective diagnostic scopes in challenging cases.