Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison

Mateo Restrepo Mejia; Juan Sebastian Arroyave; Michael Saturno; Laura Chelsea Mazudie Ndjonko; Bashar Zaidat; Rami Rajjoub; Wasil Ahmed; Ivan Zapolsky; Samuel K. Cho

doi:10.14245/ns.2347052.526

Neurospine (Mar 2024)

Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison

Mateo Restrepo Mejia,
Juan Sebastian Arroyave,
Michael Saturno,
Laura Chelsea Mazudie Ndjonko,
Bashar Zaidat,
Rami Rajjoub,
Wasil Ahmed,
Ivan Zapolsky,
Samuel K. Cho

Affiliations

Mateo Restrepo Mejia: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Juan Sebastian Arroyave: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Michael Saturno: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Laura Chelsea Mazudie Ndjonko: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Bashar Zaidat: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Rami Rajjoub: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Wasil Ahmed: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Ivan Zapolsky: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Samuel K. Cho

DOI: https://doi.org/10.14245/ns.2347052.526
Journal volume & issue: Vol. 21, no. 1
pp. 149 – 158

Abstract

Read online

Objective Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy. Methods ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories—overconclusiveness, supplementary information, and incompleteness—were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines. Results Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%). Conclusion ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.

Published in Neurospine

ISSN: 2586-6583 (Print); 2586-6591 (Online)
Publisher: Korean Spinal Neurosurgery Society
Country of publisher: Korea, Republic of
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry: Neurology. Diseases of the nervous system
Website: https://www.e-neurospine.org/

About the journal

Abstract

Keywords