OTO Open (Oct 2023)
Assessment of Artificial Intelligence Performance on the Otolaryngology Residency In‐Service Exam
Abstract
Abstract Objectives This study seeks to determine the potential use and reliability of a large language learning model for answering questions in a sub‐specialized area of medicine, specifically practice exam questions in otolaryngology–head and neck surgery and assess its current efficacy for surgical trainees and learners. Study Design and Setting All available questions from a public, paid‐access question bank were manually input through ChatGPT. Methods Outputs from ChatGPT were compared against the benchmark of the answers and explanations from the question bank. Questions were assessed in 2 domains: accuracy and comprehensiveness of explanations. Results Overall, our study demonstrates a ChatGPT correct answer rate of 53% and a correct explanation rate of 54%. We find that with increasing difficulty of questions there is a decreasing rate of answer and explanation accuracy. Conclusion Currently, artificial intelligence‐driven learning platforms are not robust enough to be reliable medical education resources to assist learners in sub‐specialty specific patient decision making scenarios.
Keywords