Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat

Shino Yamaguchi; Masaki Morishita; Hikaru Fukuda; Kosuke Muraoka; Taiji Nakamura; Izumi Yoshioka; Inho Soh; Kentaro Ono; Shuji Awano

Journal of Dental Sciences (Oct 2024)

Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat

Shino Yamaguchi,
Masaki Morishita,
Hikaru Fukuda,
Kosuke Muraoka,
Taiji Nakamura,
Izumi Yoshioka,
Inho Soh,
Kentaro Ono,
Shuji Awano

Affiliations

Shino Yamaguchi: School of Oral Health Sciences, Kyushu Dental University, Kitakyushu, Japan
Masaki Morishita: Division of Clinical Education Development and Research, Department of Oral Function, Kyushu Dental University, Kitakyushu, Japan; Health Information Management Office, Kyushu Dental University Hospital, Kitakyushu, Japan; Corresponding author. Kyushu Dental University, Division of Clinical Education Development and Research, Department of Oral Function, 2-6-1 Manazuru, Kokurakita, Kitakyushu 803-8580, Japan.
Hikaru Fukuda: Division of Maxillofacial Surgery, Department of Physical Function, Kyushu Dental University, Kitakyushu, Japan
Kosuke Muraoka: Division of Clinical Education Development and Research, Department of Oral Function, Kyushu Dental University, Kitakyushu, Japan
Taiji Nakamura: Division of Periodontology, Department of Oral Function, Kyushu Dental University, Kitakyushu, Japan
Izumi Yoshioka: Division of Oral Medicine, Department of Physical Function, Kitakyushu, Japan
Inho Soh: School of Oral Health Sciences, Kyushu Dental University, Kitakyushu, Japan
Kentaro Ono: Division of Physiology, Department of Health Promotion, Kyushu Dental University, Kitakyushu, Japan
Shuji Awano: Division of Clinical Education Development and Research, Department of Oral Function, Kyushu Dental University, Kitakyushu, Japan

Journal volume & issue: Vol. 19, no. 4
pp. 2262 – 2267

Abstract

Read online

Background/purpose: Large language models (LLMs) such as OpenAI's ChatGPT, Google's Bard, and Microsoft's Bing Chat have shown potential as educational tools in the medical and dental fields. This study evaluated their effectiveness using questions from the Japanese national dental hygienist examination, focusing on textual information only. Materials and methods: We analyzed 73 questions from the 32nd Japanese national dental hygienist examination, conducted in March 2023, using LLMs ChatGPT-3.5, GPT-4, Bard, and Bing Chat. Each question was categorized into one of nine domains. Standardized prompts were used for all LLMs, and Fisher's exact test was applied for statistical analysis. Results: GPT-4 achieved the highest accuracy (75.3%), followed by Bing (68.5%), Bard (66.7%), and GPT-3.5 (63.0%). There were no statistically significant differences between the LLMs. The performance varied across different question categories, with all models excelling in the ‘Disease mechanism and promotion of recovery process' category (100% accuracy). GPT-4 generally outperformed other models, especially in multi-answer questions. Conclusion: GPT-4 demonstrated the highest overall accuracy among the LLMs tested, indicating its superior potential as an educational support tool in dental hygiene studies. The study highlights the varied performance of different LLMs across various question categories. While GPT-4 is currently the most effective, the capabilities of LLMs in educational settings are subject to continual change and improvement.

Published in Journal of Dental Sciences

ISSN: 1991-7902 (Print)
Publisher: Elsevier
Country of publisher: Taiwan, Province of China
LCC subjects: Medicine: Dentistry
Website: http://www.journals.elsevier.com/journal-of-dental-sciences/

About the journal

Abstract

Keywords