AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries

Sophia M. Pressman; Sahar Borna; Cesar A. Gomez-Cabello; Syed Ali Haider; Antonio Jorge Forte

doi:10.3390/jcm13102832

Journal of Clinical Medicine (May 2024)

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries

Sophia M. Pressman,
Sahar Borna,
Cesar A. Gomez-Cabello,
Syed Ali Haider,
Antonio Jorge Forte

Affiliations

Sophia M. Pressman: Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
Sahar Borna: Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
Cesar A. Gomez-Cabello: Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
Syed Ali Haider: Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
Antonio Jorge Forte: Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA

DOI: https://doi.org/10.3390/jcm13102832
Journal volume & issue: Vol. 13, no. 10
p. 2832

Abstract

Read online

Background: OpenAI’s ChatGPT (San Francisco, CA, USA) and Google’s Gemini (Mountain View, CA, USA) are two large language models that show promise in improving and expediting medical decision making in hand surgery. Evaluating the applications of these models within the field of hand surgery is warranted. This study aims to evaluate ChatGPT-4 and Gemini in classifying hand injuries and recommending treatment. Methods: Gemini and ChatGPT were given 68 fictionalized clinical vignettes of hand injuries twice. The models were asked to use a specific classification system and recommend surgical or nonsurgical treatment. Classifications were scored based on correctness. Results were analyzed using descriptive statistics, a paired two-tailed t-test, and sensitivity testing. Results: Gemini, correctly classifying 70.6% hand injuries, demonstrated superior classification ability over ChatGPT (mean score 1.46 vs. 0.87, p-value Conclusions: Large language models like ChatGPT and Gemini show promise in assisting medical decision making, particularly in hand surgery, with Gemini generally outperforming ChatGPT. These findings emphasize the importance of considering the strengths and limitations of different models when integrating them into clinical practice.

Published in Journal of Clinical Medicine

ISSN: 2077-0383 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Medicine
Website: http://www.mdpi.com/journal/jcm

About the journal

Abstract

Keywords