Journal of Clinical Medicine (May 2024)

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries

  • Sophia M. Pressman,
  • Sahar Borna,
  • Cesar A. Gomez-Cabello,
  • Syed Ali Haider,
  • Antonio Jorge Forte

DOI
https://doi.org/10.3390/jcm13102832
Journal volume & issue
Vol. 13, no. 10
p. 2832

Abstract

Read online

Background: OpenAI’s ChatGPT (San Francisco, CA, USA) and Google’s Gemini (Mountain View, CA, USA) are two large language models that show promise in improving and expediting medical decision making in hand surgery. Evaluating the applications of these models within the field of hand surgery is warranted. This study aims to evaluate ChatGPT-4 and Gemini in classifying hand injuries and recommending treatment. Methods: Gemini and ChatGPT were given 68 fictionalized clinical vignettes of hand injuries twice. The models were asked to use a specific classification system and recommend surgical or nonsurgical treatment. Classifications were scored based on correctness. Results were analyzed using descriptive statistics, a paired two-tailed t-test, and sensitivity testing. Results: Gemini, correctly classifying 70.6% hand injuries, demonstrated superior classification ability over ChatGPT (mean score 1.46 vs. 0.87, p-value Conclusions: Large language models like ChatGPT and Gemini show promise in assisting medical decision making, particularly in hand surgery, with Gemini generally outperforming ChatGPT. These findings emphasize the importance of considering the strengths and limitations of different models when integrating them into clinical practice.

Keywords