ChatGPT-o1 Preview Outperforms ChatGPT-4 as a Diagnostic Support Tool for Ankle Pain Triage in Emergency Settings

Pooya Hosseini-Monfared; Shayan Amiri; Alireza Mirahmadi; Amirhossein Shahbazi; Aliasghar Alamian; Mohammad Azizi; Seyed Morteza Kazemi

doi:10.22037/aaemj.v13i1.2580

Archives of Academic Emergency Medicine (Apr 2025)

ChatGPT-o1 Preview Outperforms ChatGPT-4 as a Diagnostic Support Tool for Ankle Pain Triage in Emergency Settings

Pooya Hosseini-Monfared,
Shayan Amiri,
Alireza Mirahmadi,
Amirhossein Shahbazi,
Aliasghar Alamian,
Mohammad Azizi,
Seyed Morteza Kazemi

Affiliations

Pooya Hosseini-Monfared: Bone Joint and Related Tissues Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
Shayan Amiri: Bone and Joint Reconstruction Research Center, Department of Orthopedics, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
Alireza Mirahmadi: Musculoskeletal Translational Innovation Initiative, Carl J. Shapiro Department of Orthopaedic Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
Amirhossein Shahbazi: Student research committee, School of Medicine, Ilam University of Medical Sciences, Ilam, Iran
Aliasghar Alamian: Bone Joint and Related Tissues Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
Mohammad Azizi: Bone Joint and Related Tissues Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
Seyed Morteza Kazemi: Bone Joint and Related Tissues Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran

DOI: https://doi.org/10.22037/aaemj.v13i1.2580
Journal volume & issue: Vol. 13, no. 1

Abstract

Read online

Introduction: ChatGPT, a general-purpose language model, is not specifically optimized for medical applications. This study aimed to assess the performance of ChatGPT-4 and o1-preview in generating differential diagnoses for common cases of ankle pain in emergency settings. Methods: Common presentations of ankle pain were identified through consultations with an experienced orthopedic surgeon and a review of relevant hospital and social media sources. To replicate typical patient inquiries, questions were crafted in simple, non-technical language, requesting three possible differential diagnoses for each scenario. The second phase involved designing case vignettes reflecting scenarios typical for triage nurses or physicians. Responses from ChatGPT were evaluated against a benchmark established by two experienced orthopedic surgeons, with a scoring system assessing the accuracy, clarity, and relevance of the differential diagnoses based on their order. Results: In 21 ankle pain presentations, ChatGPT-o1 preview outperformed ChatGPT-4 in both accuracy and clarity, with only the clarity score reaching statistical significance (p < 0.001). ChatGPT-o1 preview also had a significantly higher total score (p = 0.004). In 15 case vignettes, ChatGPT-o1 preview scored better on diagnostic and management clarity, though differences in diagnostic accuracy were not statistically significant. Among 51 questions, ChatGPT-4 and ChatGPT-o1 preview produced incorrect responses for 5 (9.8%) and 4 (7.8%) questions, respectively. Inter-rater reliability analysis demonstrated excellent reliability of the scoring system with interclass coefficients of 0.99 (95% CI, 0.998–0.999) for accuracy scores and 0.99 (95% CI, 0.990–0.995) for clarity scores. Conclusion: Our findings demonstrated that both ChatGPT-4 and ChatGPT-o1 preview provide acceptable performance in the triage of ankle pain cases in emergency settings. ChatGPT-o1 preview outperformed ChatGPT-4, offering clearer and more precise responses. While both models show potential as supportive tools, their role should remain supervised and strictly supplementary to clinical expertise.

Published in Archives of Academic Emergency Medicine

ISSN: 2645-4904 (Online)
Publisher: Shahid Beheshti University of Medical Sciences
Country of publisher: Iran, Islamic Republic of
LCC subjects: Medicine: Internal medicine: Medical emergencies. Critical care. Intensive care. First aid
Website: http://journals.sbmu.ac.ir/aaem/index.php/AAEM

About the journal

Abstract

Keywords