Foot & Ankle Surgery: Techniques, Reports & Cases (Jan 2025)
Integrating domain-specific resources: Advancing AI for foot and ankle surgery
Abstract
Large language models like ChatGPT offer significant potential for applications in medicine, including patient education and clinical support. This study evaluates the performance of ChatGPT-4, ChatGPT-4 enhanced with retrieval-augmented generation (RAG), and Gemini AI in responding to clinical vignette questions regarding Hallux Rigidus, a condition requiring specialized knowledge in foot and ankle surgery. The ChatGPT-4 + RAG model, enhanced with the 2024 ACFAS clinical consensus statements, demonstrated the highest agreement with surveyor majority responses (83.26 %) compared to ChatGPT-4 (59.54 %) and Gemini AI (53.02 %). All models provided clinically appropriate responses to most questions, with the ChatGPT-4 + RAG model excelling in accuracy, despite the rationale for answers being deemed most difficult to read. These findings highlight the limitations of generic AI models, which may propagate misinformation if used by patients seeking health information. By incorporating domain-specific resources, the RAG-augmented model showed enhanced reliability and contextual accuracy, suggesting their potential as tools for both clinical decision-making and patient education. This study emphasizes the importance of integrating verified medical resources to advance AI in healthcare, addressing critical gaps in existing capabilities while minimizing risks of misinformation.