Quantitative evaluation of GPT-4’s performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation

Xiang Gao; Xu Li; Juntan Li; Tianxu Dou; Yuyang Gao; Wannan Zhu

doi:10.1136/bmjopen-2023-082344

BMJ Open (Dec 2024)

Quantitative evaluation of GPT-4’s performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation

Xiang Gao,
Xu Li,
Juntan Li,
Tianxu Dou,
Yuyang Gao,
Wannan Zhu

Affiliations

Xiang Gao: Public Health Education, UNC Greensboro, Greensboro, North Carolina, USA
Xu Li: 1 Department of Anesthesiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
Juntan Li: Jinzhou Medical University, Jinzhou, Liaoning, China
Tianxu Dou: Department of Orthopedics, The First Hospital of China Medical University, Shenyang, China
Yuyang Gao: Department of Orthopedics, The First Hospital of China Medical University, Shenyang, China
Wannan Zhu: Jinzhou Medical University, Jinzhou, Liaoning, China

DOI: https://doi.org/10.1136/bmjopen-2023-082344
Journal volume & issue: Vol. 14, no. 12

Abstract

Read online

Objectives To evaluate GPT-4’s performance in interpreting osteoarthritis (OA) treatment guidelines from the USA and China, and to assess its ability to diagnose and manage orthopaedic cases.Setting The study was conducted using publicly available OA treatment guidelines and simulated orthopaedic case scenarios.Participants No human participants were involved. The evaluation focused on GPT-4’s responses to clinical guidelines and case questions, assessed by two orthopaedic specialists.Outcomes Primary outcomes included the accuracy and completeness of GPT-4’s responses to guideline-based queries and case scenarios. Metrics included the correct match rate, completeness score and stratification of case responses into predefined tiers of correctness.Results In interpreting the American Academy of Orthopaedic Surgeons and Chinese OA guidelines, GPT-4 achieved a correct match rate of 46.4% and complete agreement with all score-2 recommendations. The accuracy score for guideline interpretation was 4.3±1.6 (95% CI 3.9 to 4.7), and the completeness score was 2.8±0.6 (95% CI 2.5 to 3.1). For case-based questions, GPT-4 demonstrated high performance, with over 88% of responses rated as comprehensive.Conclusions GPT-4 demonstrates promising capabilities as an auxiliary tool in orthopaedic clinical practice and patient education, with high levels of accuracy and completeness in guideline interpretation and clinical case analysis. However, further validation is necessary to establish its utility in real-world clinical settings.

Published in BMJ Open

ISSN: 2044-6055 (Online)
Publisher: BMJ Publishing Group
Country of publisher: United Kingdom
LCC subjects: Medicine
Website: https://bmjopen.bmj.com

About the journal