Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions

Evan A. Patel; Lindsay Fleischer; Peter Filip; Michael Eggerstedt; Michael Hutz; Elias Michaelides; Pete S. Batra; Bobby A. Tajudeen

doi:10.1002/oto2.164

OTO Open (Apr 2024)

Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions

Evan A. Patel,
Lindsay Fleischer,
Peter Filip,
Michael Eggerstedt,
Michael Hutz,
Elias Michaelides,
Pete S. Batra,
Bobby A. Tajudeen

Affiliations

Evan A. Patel: Department of Otorhinolaryngology–Head and Neck Surgery Rush University Medical Center Chicago Illinois USA
Lindsay Fleischer: Department of Otorhinolaryngology–Head and Neck Surgery Rush University Medical Center Chicago Illinois USA
Peter Filip: Department of Otorhinolaryngology–Head and Neck Surgery Rush University Medical Center Chicago Illinois USA
Michael Eggerstedt: Department of Otorhinolaryngology–Head and Neck Surgery Rush University Medical Center Chicago Illinois USA
Michael Hutz: Department of Otorhinolaryngology–Head and Neck Surgery Rush University Medical Center Chicago Illinois USA
Elias Michaelides: Department of Otorhinolaryngology–Head and Neck Surgery Rush University Medical Center Chicago Illinois USA
Pete S. Batra: Department of Otorhinolaryngology–Head and Neck Surgery Rush University Medical Center Chicago Illinois USA
Bobby A. Tajudeen: Department of Otorhinolaryngology–Head and Neck Surgery Rush University Medical Center Chicago Illinois USA

DOI: https://doi.org/10.1002/oto2.164
Journal volume & issue: Vol. 8, no. 2
pp. n/a – n/a

Abstract

Read online

Abstract Objective Advances in deep learning and artificial intelligence (AI) have led to the emergence of large language models (LLM) like ChatGPT from OpenAI. The study aimed to evaluate the performance of ChatGPT 3.5 and GPT4 on Otolaryngology (Rhinology) Standardized Board Examination questions in comparison to Otolaryngology residents. Methods This study selected all 127 rhinology standardized questions from www.boardvitals.com, a commonly used study tool by otolaryngology residents preparing for board exams. Ninety‐three text‐based questions were administered to ChatGPT 3.5 and GPT4, and their answers were compared with the average results of the question bank (used primarily by otolaryngology residents). Thirty‐four image‐based questions were provided to GPT4 and underwent the same analysis. Based on the findings of an earlier study, a pass‐fail cutoff was set at the 10th percentile. Results On text‐based questions, ChatGPT 3.5 answered correctly 45.2% of the time (8th percentile) (P = .0001), while GPT4 achieved 86.0% (66th percentile) (P = .001). GPT4 answered image‐based questions correctly 64.7% of the time. Projections suggest that ChatGPT 3.5 might not pass the American Board of Otolaryngology Written Question Exam (ABOto WQE), whereas GPT4 stands a strong chance of passing. Discussion The older LLM, ChatGPT 3.5, is unlikely to pass the ABOto WQE. However, the advanced GPT4 model exhibits a much higher likelihood of success. This rapid progression in AI indicates its potential future role in otolaryngology education. Implications for Practice As AI technology rapidly advances, it may be that AI‐assisted medical education, diagnosis, and treatment planning become commonplace in the medical and surgical landscape. Level of Evidence Level 5.

Published in OTO Open

ISSN: 2473-974X (Online)
Publisher: Wiley
Country of publisher: United States
LCC subjects: Medicine: Otorhinolaryngology; Medicine: Surgery
Website: https://aao-hnsfjournals.onlinelibrary.wiley.com/journal/2473974x

About the journal

Abstract

Keywords