ChatGPT-4 Surpasses Residents: A Study of Artificial Intelligence Competency in Plastic Surgery In-service Examinations and Its Advancements from ChatGPT-3.5

Shannon S. Hubany, BS; Fernanda D. Scala, MD; Kiana Hashemi, BS; Saumya Kapoor, BS; Julia R. Fedorova, BS; Matthew J. Vaccaro, BS; Rees P. Ridout, BS; Casey C. Hedman, BS; Brian C. Kellogg, MD; Angelo A. Leto Barone, MD

doi:10.1097/GOX.0000000000006136

Plastic and Reconstructive Surgery, Global Open (Sep 2024)

ChatGPT-4 Surpasses Residents: A Study of Artificial Intelligence Competency in Plastic Surgery In-service Examinations and Its Advancements from ChatGPT-3.5

Shannon S. Hubany, BS,
Fernanda D. Scala, MD,
Kiana Hashemi, BS,
Saumya Kapoor, BS,
Julia R. Fedorova, BS,
Matthew J. Vaccaro, BS,
Rees P. Ridout, BS,
Casey C. Hedman, BS,
Brian C. Kellogg, MD,
Angelo A. Leto Barone, MD

Affiliations

Shannon S. Hubany, BS: From the * University of Central Florida College of Medicine, Orlando, Fla.
Fernanda D. Scala, MD: † Division of Craniofacial and Pediatric Plastic Surgery, Nemours Children’s Hospital, Orlando, Fla.
Kiana Hashemi, BS: From the * University of Central Florida College of Medicine, Orlando, Fla.
Saumya Kapoor, BS: From the * University of Central Florida College of Medicine, Orlando, Fla.
Julia R. Fedorova, BS: From the * University of Central Florida College of Medicine, Orlando, Fla.
Matthew J. Vaccaro, BS: From the * University of Central Florida College of Medicine, Orlando, Fla.
Rees P. Ridout, BS: From the * University of Central Florida College of Medicine, Orlando, Fla.
Casey C. Hedman, BS: From the * University of Central Florida College of Medicine, Orlando, Fla.
Brian C. Kellogg, MD: † Division of Craniofacial and Pediatric Plastic Surgery, Nemours Children’s Hospital, Orlando, Fla.
Angelo A. Leto Barone, MD: † Division of Craniofacial and Pediatric Plastic Surgery, Nemours Children’s Hospital, Orlando, Fla.

DOI: https://doi.org/10.1097/GOX.0000000000006136
Journal volume & issue: Vol. 12, no. 9
p. e6136

Abstract

Read online

Background:. ChatGPT, launched in 2022 and updated to Generative Pre-trained Transformer 4 (GPT-4) in 2023, is a large language model trained on extensive data, including medical information. This study compares ChatGPT’s performance on Plastic Surgery In-Service Examinations with medical residents nationally as well as its earlier version, ChatGPT-3.5. Methods:. This study reviewed 1500 questions from the Plastic Surgery In-service Examinations from 2018 to 2023. After excluding image-based, unscored, and inconclusive questions, 1292 were analyzed. The question stem and each multiple-choice answer was inputted verbatim into ChatGPT-4. Results:. ChatGPT-4 correctly answered 961 (74.4%) of the included questions. Best performance by section was in core surgical principles (79.1% correct) and lowest in craniomaxillofacial (69.1%). ChatGPT-4 ranked between the 61st and 97th percentiles compared with all residents. Comparatively, ChatGPT-4 significantly outperformed ChatGPT-3.5 in 2018–2022 examinations (P < 0.001). Although ChatGPT-3.5 averaged 55.5% correctness, ChatGPT-4 averaged 74%, a mean difference of 18.54%. In 2021, ChatGPT-3.5 ranked in the 23rd percentile of all residents, whereas ChatGPT-4 ranked in the 97th percentile. ChatGPT-4 outperformed 80.7% of residents on average and scored above the 97th percentile among first-year residents. Its performance was comparable with sixth-year integrated residents, ranking in the 55.7th percentile, on average. These results show significant improvements in ChatGPT-4’s application of medical knowledge within six months of ChatGPT-3.5’s release. Conclusion:. This study reveals ChatGPT-4’s rapid developments, advancing from a first-year medical resident’s level to surpassing independent residents and matching a sixth-year resident’s proficiency.

Published in Plastic and Reconstructive Surgery, Global Open

ISSN: 2169-7574 (Online)
Publisher: Wolters Kluwer
Country of publisher: United States
LCC subjects: Medicine: Surgery
Website: http://www.prsgo.com

About the journal