Assessment of Artificial Intelligence Performance on the Otolaryngology Residency In‐Service Exam

Arushi P. Mahajan; Christina L. Shabet; Joshua Smith; Shannon F. Rudy; Robbi A. Kupfer; Lauren A. Bohm

doi:10.1002/oto2.98

OTO Open (Oct 2023)

Assessment of Artificial Intelligence Performance on the Otolaryngology Residency In‐Service Exam

Arushi P. Mahajan,
Christina L. Shabet,
Joshua Smith,
Shannon F. Rudy,
Robbi A. Kupfer,
Lauren A. Bohm

Affiliations

Arushi P. Mahajan: University of Michigan Medical School Ann Arbor Michigan USA
Christina L. Shabet: University of Michigan Medical School Ann Arbor Michigan USA
Joshua Smith: Department of Otolaryngology–Head and Neck Surgery University of Michigan School of Medicine Ann Arbor Michigan USA
Shannon F. Rudy: Department of Otolaryngology–Head and Neck Surgery University of Michigan School of Medicine Ann Arbor Michigan USA
Robbi A. Kupfer: Department of Otolaryngology–Head and Neck Surgery University of Michigan School of Medicine Ann Arbor Michigan USA
Lauren A. Bohm: Department of Otolaryngology–Head and Neck Surgery University of Michigan School of Medicine Ann Arbor Michigan USA

DOI: https://doi.org/10.1002/oto2.98
Journal volume & issue: Vol. 7, no. 4
pp. n/a – n/a

Abstract

Read online

Abstract Objectives This study seeks to determine the potential use and reliability of a large language learning model for answering questions in a sub‐specialized area of medicine, specifically practice exam questions in otolaryngology–head and neck surgery and assess its current efficacy for surgical trainees and learners. Study Design and Setting All available questions from a public, paid‐access question bank were manually input through ChatGPT. Methods Outputs from ChatGPT were compared against the benchmark of the answers and explanations from the question bank. Questions were assessed in 2 domains: accuracy and comprehensiveness of explanations. Results Overall, our study demonstrates a ChatGPT correct answer rate of 53% and a correct explanation rate of 54%. We find that with increasing difficulty of questions there is a decreasing rate of answer and explanation accuracy. Conclusion Currently, artificial intelligence‐driven learning platforms are not robust enough to be reliable medical education resources to assist learners in sub‐specialty specific patient decision making scenarios.

Published in OTO Open

ISSN: 2473-974X (Online)
Publisher: Wiley
Country of publisher: United States
LCC subjects: Medicine: Otorhinolaryngology; Medicine: Surgery
Website: https://aao-hnsfjournals.onlinelibrary.wiley.com/journal/2473974x

About the journal

Abstract

Keywords