Is ChatGPT a trusted source of information for total hip and knee arthroplasty patients?

Benjamin M. Wright; Michael S. Bodnar; Andrew D. Moore; Meghan C. Maseda; Michael P. Kucharik; Connor C. Diaz; Christian M. Schmidt; Hassan R. Mir

doi:10.1302/2633-1462.52.BJO-2023-0113.R1

Bone & Joint Open (Feb 2024)

Is ChatGPT a trusted source of information for total hip and knee arthroplasty patients?

Benjamin M. Wright,
Michael S. Bodnar,
Andrew D. Moore,
Meghan C. Maseda,
Michael P. Kucharik,
Connor C. Diaz,
Christian M. Schmidt,
Hassan R. Mir

Affiliations

Benjamin M. Wright: ORCiD; Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
Michael S. Bodnar: Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
Andrew D. Moore: Department of Orthopaedic Surgery, University of South Florida, Tampa, Florida, USA
Meghan C. Maseda: Department of Orthopaedic Surgery, University of South Florida, Tampa, Florida, USA
Michael P. Kucharik: Department of Orthopaedic Surgery, University of South Florida, Tampa, Florida, USA
Connor C. Diaz: Department of Orthopaedic Surgery, University of South Florida, Tampa, Florida, USA
Christian M. Schmidt: Department of Orthopaedic Surgery, University of South Florida, Tampa, Florida, USA
Hassan R. Mir: Orthopaedic Trauma Service, Florida Orthopedic Institute, Tampa, Florida, USA

DOI: https://doi.org/10.1302/2633-1462.52.BJO-2023-0113.R1
Journal volume & issue: Vol. 5, no. 2
pp. 139 – 146

Abstract

Read online

Aims: While internet search engines have been the primary information source for patients’ questions, artificial intelligence large language models like ChatGPT are trending towards becoming the new primary source. The purpose of this study was to determine if ChatGPT can answer patient questions about total hip (THA) and knee arthroplasty (TKA) with consistent accuracy, comprehensiveness, and easy readability. Methods: We posed the 20 most Google-searched questions about THA and TKA, plus ten additional postoperative questions, to ChatGPT. Each question was asked twice to evaluate for consistency in quality. Following each response, we responded with, “Please explain so it is easier to understand,” to evaluate ChatGPT’s ability to reduce response reading grade level, measured as Flesch-Kincaid Grade Level (FKGL). Five resident physicians rated the 120 responses on 1 to 5 accuracy and comprehensiveness scales. Additionally, they answered a “yes” or “no” question regarding acceptability. Mean scores were calculated for each question, and responses were deemed acceptable if ≥ four raters answered “yes.” Results: The mean accuracy and comprehensiveness scores were 4.26 (95% confidence interval (CI) 4.19 to 4.33) and 3.79 (95% CI 3.69 to 3.89), respectively. Out of all the responses, 59.2% (71/120; 95% CI 50.0% to 67.7%) were acceptable. ChatGPT was consistent when asked the same question twice, giving no significant difference in accuracy (t = 0.821; p = 0.415), comprehensiveness (t = 1.387; p = 0.171), acceptability (χ2 = 1.832; p = 0.176), and FKGL (t = 0.264; p = 0.793). There was a significantly lower FKGL (t = 2.204; p = 0.029) for easier responses (11.14; 95% CI 10.57 to 11.71) than original responses (12.15; 95% CI 11.45 to 12.85). Conclusion: ChatGPT answered THA and TKA patient questions with accuracy comparable to previous reports of websites, with adequate comprehensiveness, but with limited acceptability as the sole information source. ChatGPT has potential for answering patient questions about THA and TKA, but needs improvement. Cite this article: Bone Jt Open 2024;5(2):139–146.

Published in Bone & Joint Open

ISSN: 2633-1462 (Online)
Publisher: The British Editorial Society of Bone & Joint Surgery
Country of publisher: United Kingdom
LCC subjects: Medicine: Surgery: Orthopedic surgery
Website: https://online.boneandjoint.org.uk/journal/bjo

About the journal

Abstract

Keywords