Plastic and Reconstructive Surgery, Global Open (Sep 2024)

Class in Session: Analysis of GPT-4-created Plastic Surgery In-service Examination Questions

  • Daniel Najafali, BS,
  • Logan G. Galbraith, BA,
  • Justin M. Camacho, MBA,
  • Victoria Stoffel, MS,
  • Isabel Herzog, BA,
  • Civanni Moss, BSN, RN,
  • Stephanie L. Taiberg, BS,
  • Leonard Knoedler, MD

DOI
https://doi.org/10.1097/GOX.0000000000006185
Journal volume & issue
Vol. 12, no. 9
p. e6185

Abstract

Read online

Background:. The Plastic Surgery In-Service Training Examination (PSITE) remains a critical milestone in residency training. Successful preparation requires extensive studying during an individual’s residency. This study focuses on the capacity of Generative Pre-trained Transformer 4 (GPT-4) to generate PSITE practice questions. Methods:. GPT-4 was prompted to generate multiple choice questions for each PSITE section and provide answer choices with detailed rationale. Question composition via readability metrics were analyzed, along with quality. Descriptive statistics compared GPT-4 and the 2022 PSITE. Results:. The overall median Flesch–Kincaid reading ease for GPT-4-generated questions was 43.90 (versus 50.35 PSITE, P = 0.036). GPT-4 provided questions that contained significantly fewer mean sentences (1 versus 4), words (16 versus 56), and percentage of complex words (3 versus 13) than 2022 PSITE questions (P < 0.001). When evaluating GPT-4 generated questions for each examination section, the highest median Flesch–Kincaid reading ease was on the core surgical principles section (median: 63.30, interquartile range [54.45–68.28]) and the lowest was on the craniomaxillofacial section (median: 36.25, interquartile range [12.57–58.40]). Most readability metrics were higher for the 2022 PSITE compared with GPT-4 generated questions. Overall question quality was poor for the chatbot. Conclusions:. Our study found that GPT-4 can be adapted to generate practice questions for the 2022 PSITE, but its questions are of poor quality. The program can offer general explanations for both the correct and incorrect answer options but was observed to generate false information and poor-quality explanations. Although trainees should navigate with caution as the technology develops, GPT-4 has the potential to serve as an effective educational adjunct under the supervision of trained plastic surgeons.