Plastic and Reconstructive Surgery, Global Open (Dec 2023)

Can a Machine Ace the Test? Assessing GPT-4.0’s Precision in Plastic Surgery Board Examinations

  • Abdullah A. Al Qurashi, MBBS,
  • Ibrahim Abdullah S Albalawi,
  • Ibrahim R. Halawani,
  • Alanoud Hammam Asaad,
  • Adnan M. Osama Al Dwehji,
  • Hala Abdullah Almusa,
  • Ruba Ibrahim Alharbi,
  • Hussain Amin Alobaidi,
  • Subhi M. K. Zino Alarki,
  • Fahad K. Aljindan, MBBS, SB-PLAST

DOI
https://doi.org/10.1097/GOX.0000000000005448
Journal volume & issue
Vol. 11, no. 12
p. e5448

Abstract

Read online

Background:. As artificial intelligence makes rapid inroads across various fields, its value in medical education is becoming increasingly evident. This study evaluates the performance of the GPT-4.0 large language model in responding to plastic surgery board examination questions and explores its potential as a learning tool. Methods:. We used a selection of 50 questions from 19 different chapters of a widely-used plastic surgery reference. Responses generated by the GPT-4.0 model were assessed based on four parameters: accuracy, clarity, completeness, and conciseness. Correlation analyses were conducted to ascertain the relationship between these parameters and the overall performance of the model. Results:. GPT-4.0 showed a strong performance with high mean scores for accuracy (2.88), clarity (3.00), completeness (2.88), and conciseness (2.92) on a three-point scale. Completeness of the model’s responses was significantly correlated with accuracy (P < 0.0001), whereas no significant correlation was found between accuracy and clarity or conciseness. Performance variability across different chapters indicates potential limitations of the model in dealing with certain complex topics in plastic surgery. Conclusions:. The GPT-4.0 model exhibits considerable potential as an auxiliary tool for preparation for plastic surgery board examinations. Despite a few identified limitations, the generally high scores on key parameters suggest the model’s ability to provide responses that are accurate, clear, complete, and concise. Future research should focus on enhancing the performance of artificial intelligence models in complex medical topics, further improving their applicability in medical education.