JBJS Open Access (Jun 2024)

ChatGPT Is Moderately Accurate in Providing a General Overview of Orthopaedic Conditions

  • Chandler A. Sparks, MS,
  • Sydney M. Fasulo, MD,
  • Jordan T. Windsor, BS,
  • Vita Bankauskas, BA,
  • Edward V. Contrada, BS,
  • Matthew J. Kraeutler, MD,
  • Anthony J. Scillia, MD

DOI
https://doi.org/10.2106/JBJS.OA.23.00129
Journal volume & issue
Vol. 9, no. 2

Abstract

Read online

Background:. ChatGPT is an artificial intelligence chatbot capable of providing human-like responses for virtually every possible inquiry. This advancement has provoked public interest regarding the use of ChatGPT, including in health care. The purpose of the present study was to investigate the quantity and accuracy of ChatGPT outputs for general patient-focused inquiries regarding 40 orthopaedic conditions. Methods:. For each of the 40 conditions, ChatGPT (GPT-3.5) was prompted with the text “I have been diagnosed with [condition]. Can you tell me more about it?” The numbers of treatment options, risk factors, and symptoms given for each condition were compared with the number in the corresponding American Academy of Orthopaedic Surgeons (AAOS) OrthoInfo website article for information quantity assessment. For accuracy assessment, an attending orthopaedic surgeon ranked the outputs in the categories of <50%, 50% to 74%, 75% to 99%, and 100% accurate. An orthopaedics sports medicine fellow also independently ranked output accuracy. Results:. Compared with the AAOS OrthoInfo website, ChatGPT provided significantly fewer treatment options (mean difference, −2.5; p < 0.001) and risk factors (mean difference, −1.1; p = 0.02) but did not differ in the number of symptoms given (mean difference, −0.5; p = 0.31). The surgical treatment options given by ChatGPT were often nondescript (n = 20 outputs), such as “surgery” as the only operative treatment option. Regarding accuracy, most conditions (26 of 40; 65%) were ranked as mostly (75% to 99%) accurate, with the others (14 of 40; 35%) ranked as moderately (50% to 74%) accurate, by an attending surgeon. Neither surgeon ranked any condition as mostly inaccurate (<50% accurate). Interobserver agreement between accuracy ratings was poor (κ = 0.03; p = 0.30). Conclusions:. ChatGPT provides at least moderately accurate outputs for general inquiries of orthopaedic conditions but is lacking in the quantity of information it provides for risk factors and treatment options. Professional organizations, such as the AAOS, are the preferred source of musculoskeletal information when compared with ChatGPT. Clinical Relevance:. ChatGPT is an emerging technology with potential roles and limitations in patient education that are still being explored.