Dall-E in hand surgery: Exploring the utility of ChatGPT image generation

Daniel Soroudi; Daniel S. Rouhani; Alap Patel; Ryan Sadjadi; Reta Behnam-Hanona; Nicholas C. Oleck; Israel Falade; Merisa Piper; Scott L. Hansen

doi:10.1016/j.sopen.2025.04.012

Surgery Open Science (Jun 2025)

Dall-E in hand surgery: Exploring the utility of ChatGPT image generation

Daniel Soroudi,
Daniel S. Rouhani,
Alap Patel,
Ryan Sadjadi,
Reta Behnam-Hanona,
Nicholas C. Oleck,
Israel Falade,
Merisa Piper,
Scott L. Hansen

Affiliations

Daniel Soroudi: University of California San Francisco, School of Medicine, San Francisco, CA, USA
Daniel S. Rouhani: University of California San Francisco, School of Medicine, San Francisco, CA, USA
Alap Patel: University of California San Francisco, Department of Surgery, Division of Plastic and Reconstructive Surgery, San Francisco, CA, USA
Ryan Sadjadi: University of California San Francisco, School of Medicine, San Francisco, CA, USA
Reta Behnam-Hanona: University of California San Francisco, School of Medicine, San Francisco, CA, USA
Nicholas C. Oleck: Division of Plastic Surgery, Duke University Medical Center, Durham, NC, USA
Israel Falade: University of California San Francisco, School of Medicine, San Francisco, CA, USA
Merisa Piper: University of California San Francisco, Department of Surgery, Division of Plastic and Reconstructive Surgery, San Francisco, CA, USA
Scott L. Hansen: University of California San Francisco, Department of Surgery, Division of Plastic and Reconstructive Surgery, San Francisco, CA, USA; Corresponding author at: 505 Parnassus Ave, M-593, Box 0932, San Francisco, CA 94143, USA.

DOI: https://doi.org/10.1016/j.sopen.2025.04.012
Journal volume & issue: Vol. 26
pp. 64 – 78

Abstract

Read online

Background: Artificial intelligence (AI) has significantly influenced various medical fields, including plastic surgery. Large language model (LLM) chatbots such as ChatGPT and text-to-image tools like Dall-E and GPT-4o are gaining broader adoption. This study explores the capabilities and limitations of these tools in hand surgery, focusing on their application in patient and medical education. Methods: Utilizing Google Trends data, common search terms were identified and queried on ChatGPT-4.5 and ChatGPT-3.5 from the following categories: “Hand Anatomy”, “Hand Fracture”, “Hand Joint Injury”, “Hand Tumor”, and “Hand Dislocation”. Responses were graded on a 1–5 scale for accuracy and evaluated using the Flesch-Kincaid Grade Level, Patient Education Materials Assessment Tool (PEMAT), and DISCERN instrument. GPT 4o, DALL-E 3, and DALL-E 2 illustrated visual representations of selected ChatGPT responses in each category, which were further evaluated. Results: ChatGPT-4.5 achieved a DISCERN overall score of 3.80 ± 0.23. Its responses averaged 91.67 ± 0.29 for PEMAT understandability and 54.67 ± 0.55 for actionability. Accuracy was 4.47 ± 0.52, with a Flesch-Kincaid Grade Level of 9.26 ± 1.04. ChatGPT-4.5 consistently outperformed ChatGPT-3.5 across all evaluation metrics. For text-to-image generation, GPT-4o produced more accurate visuals compared to DALL-E 3 and DALL-E 2. Conclusions: This study highlights the strengths and limitations of ChatGPT-4.5 and GPT-4o in hand surgery education. While combining accurate text generation with image creation shows promise, these AI tools still need further refinement before widespread clinical adoption.

Published in Surgery Open Science

ISSN: 2589-8450 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Surgery
Website: https://www.journals.elsevier.com/surgery-open-science/

About the journal