Surgery Open Science (Jun 2025)
Dall-E in hand surgery: Exploring the utility of ChatGPT image generation
Abstract
Background: Artificial intelligence (AI) has significantly influenced various medical fields, including plastic surgery. Large language model (LLM) chatbots such as ChatGPT and text-to-image tools like Dall-E and GPT-4o are gaining broader adoption. This study explores the capabilities and limitations of these tools in hand surgery, focusing on their application in patient and medical education. Methods: Utilizing Google Trends data, common search terms were identified and queried on ChatGPT-4.5 and ChatGPT-3.5 from the following categories: “Hand Anatomy”, “Hand Fracture”, “Hand Joint Injury”, “Hand Tumor”, and “Hand Dislocation”. Responses were graded on a 1–5 scale for accuracy and evaluated using the Flesch-Kincaid Grade Level, Patient Education Materials Assessment Tool (PEMAT), and DISCERN instrument. GPT 4o, DALL-E 3, and DALL-E 2 illustrated visual representations of selected ChatGPT responses in each category, which were further evaluated. Results: ChatGPT-4.5 achieved a DISCERN overall score of 3.80 ± 0.23. Its responses averaged 91.67 ± 0.29 for PEMAT understandability and 54.67 ± 0.55 for actionability. Accuracy was 4.47 ± 0.52, with a Flesch-Kincaid Grade Level of 9.26 ± 1.04. ChatGPT-4.5 consistently outperformed ChatGPT-3.5 across all evaluation metrics. For text-to-image generation, GPT-4o produced more accurate visuals compared to DALL-E 3 and DALL-E 2. Conclusions: This study highlights the strengths and limitations of ChatGPT-4.5 and GPT-4o in hand surgery education. While combining accurate text generation with image creation shows promise, these AI tools still need further refinement before widespread clinical adoption.