iScience (Dec 2024)

Evaluation of the ability of large language models to self-diagnose oral diseases

  • Shiyang Zhuang,
  • Yuanhao Zeng,
  • Shaojunjie Lin,
  • Xirui Chen,
  • Yishan Xin,
  • Hongyan Li,
  • Yiming Lin,
  • Chaofan Zhang,
  • Yunzhi Lin

Journal volume & issue
Vol. 27, no. 12
p. 111495

Abstract

Read online

Summary: Large language models (LLMs) offer potential in primary dental care. We conducted an evaluation of LLMs’ diagnostic capabilities across various oral diseases and contexts. All LLMs showed diagnostic capabilities for temporomandibular joint disorders, periodontal disease, dental caries, and malocclusion. The prompts did not affect the performance of ChatGPT 3.5. When Chinese was used, the diagnostic ability of ChatGPT 3.5 for pulpitis improved (0% vs. 61.7%, p < 0.001), while the ability to diagnose pericoronitis decreased (8% vs. 0%, p < 0.001). For ChatGPT 4.0 in Chinese, they were both improved (0% vs. 92%, 8% vs. 72%, p < 0.001, respectively). Claude 2 exhibited the highest accuracy in diagnosing pulpitis (36%, p = 0.048), ChatGPT 4.0 showed complete diagnostic capability for pericoronitis. Llama 2 and Claude 3.5 Sonnet exhibited complete diagnostic capability for oral cancer. In conclusion, LLMs may be a potential tool for daily dental care but need further updates.

Keywords