Saudi Dental Journal (Dec 2024)

An exploratory assessment of GPT-4o and GPT-4 performance on the Japanese National Dental Examination

  • Masaki Morishita,
  • Hikaru Fukuda,
  • Shino Yamaguchi,
  • Kosuke Muraoka,
  • Taiji Nakamura,
  • Masanari Hayashi,
  • Izumi Yoshioka,
  • Kentaro Ono,
  • Shuji Awano

Journal volume & issue
Vol. 36, no. 12
pp. 1577 – 1581

Abstract

Read online

Background and Objectives: Multiple large language models (LLMs) have been released since 2022, including OpenAI’s GPT-3.5 and GPT-4. The latest model, GPT-4o, introduced on May 13, 2024, significantly improves GPT-4. Previous studies have shown the potential of LLMs as educational tools in medical and dental exams. This study evaluates the accuracy of GPT-4 and GPT-4o responses for the Japanese National Dental Examination (JNDE) to assess their potential as educational tools for dental education. Materials and methods: We obtained the dataset of the 117th JNDE, administered in January 2024, consisting of 360 questions. After excluding questions with images and inappropriate ones, 202 questions were selected. GPT-4 and GPT-4o were used to generate responses. Standardized prompts ensured consistent input. Data analysis used Qlik Sense® and GraphPad Prism, employing Fisher’s exact test. Results: GPT-4o showed a significantly higher correct response rate (73.8%) than GPT-4 (63.3%). In the compulsory section, GPT-4o achieved 88.6% accuracy, significantly higher than GPT-4′s 74.3%. Though not statistically significant, the general section saw an improvement with GPT-4o (66.4%) over GPT-4 (58.0%). Conclusion: GPT-4o significantly outperformed GPT-4 in accuracy for JNDE questions, suggesting its improved potential as an educational tool in dental education. Further studies are needed to evaluate GPT-4o’s capabilities with visual materials and in diverse question sets to fully ascertain its utility in educational settings.

Keywords