Evaluation of Artificial Intelligence–generated Responses to Common Plastic Surgery Questions

Libby R. Copeland-Halperin, MD; Lauren O’Brien, BSN, RN; Michelle Copeland, DMD, MD, FACS

doi:10.1097/GOX.0000000000005226

Plastic and Reconstructive Surgery, Global Open (Aug 2023)

Evaluation of Artificial Intelligence–generated Responses to Common Plastic Surgery Questions

Libby R. Copeland-Halperin, MD,
Lauren O’Brien, BSN, RN,
Michelle Copeland, DMD, MD, FACS

Affiliations

Libby R. Copeland-Halperin, MD: From * Northwell Health, New York, N.Y.
Lauren O’Brien, BSN, RN: † Michelle Copeland DMD MD PC, New York, N.Y.
Michelle Copeland, DMD, MD, FACS: ‡ Icahn School of Medicine at Mount Sinai, New York, N.Y.

DOI: https://doi.org/10.1097/GOX.0000000000005226
Journal volume & issue: Vol. 11, no. 8
p. e5226

Abstract

Read online

Background:. Artificial intelligence (AI) is increasingly used to answer questions, yet the accuracy and validity of current tools are uncertain. In contrast to internet queries, AI generates summary responses as definitive. The internet is rife with inaccuracies, and plastic surgery management guidelines evolve, making verifiable information important. Methods:. We posed 10 questions about breast implant-associated illness, anaplastic large lymphoma, and squamous carcinoma to Bing, using the “more balanced” option, and to ChatGPT. Answers were reviewed by two plastic surgeons for accuracy and fidelity to information on the Food and Drug Administration (FDA) and American Society of Plastic Surgeons (ASPS) websites. We also presented 10 multiple-choice questions from the 2022 plastic surgery in-service examination to Bing, using the “more precise” option, and ChatGPT. Questions were repeated three times over consecutive weeks, and answers were evaluated for accuracy and stability. Results:. Compared with answers from the FDA and ASPS, Bing and ChatGPT were accurate. Bing answered 10 of the 30 multiple-choice questions correctly, nine incorrectly, and did not answer 11. ChatGPT correctly answered 16 and incorrectly answered 14. In both parts, responses from Bing were shorter, less detailed, and referred to verified and unverified sources; ChatGPT did not provide citations. Conclusions:. These AI tools provided accurate information from the FDA and ASPS websites, but neither consistently answered questions requiring nuanced decision-making correctly. Advances in applications to plastic surgery will require algorithms that selectively identify, evaluate, and exclude information to enhance the accuracy, precision, validity, reliability, and utility of AI-generated responses.

Published in Plastic and Reconstructive Surgery, Global Open

ISSN: 2169-7574 (Online)
Publisher: Wolters Kluwer
Country of publisher: United States
LCC subjects: Medicine: Surgery
Website: http://www.prsgo.com

About the journal