Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery

Bashar Zaidat; Nancy Shrestha; Ashley M. Rosenberg; Wasil Ahmed; Rami Rajjoub; Timothy Hoang; Mateo Restrepo Mejia; Akiro H. Duey; Justin E. Tang; Jun S. Kim; Samuel K. Cho

doi:10.14245/ns.2347310.655

Neurospine (Mar 2024)

Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery

Bashar Zaidat,
Nancy Shrestha,
Ashley M. Rosenberg,
Wasil Ahmed,
Rami Rajjoub,
Timothy Hoang,
Mateo Restrepo Mejia,
Akiro H. Duey,
Justin E. Tang,
Jun S. Kim,
Samuel K. Cho

Affiliations

Bashar Zaidat: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, Korea
Nancy Shrestha: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, Korea
Ashley M. Rosenberg: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, Korea
Wasil Ahmed: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, Korea
Rami Rajjoub: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, Korea
Timothy Hoang: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, Korea
Mateo Restrepo Mejia: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, Korea
Akiro H. Duey: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, Korea
Justin E. Tang: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, Korea
Jun S. Kim: Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, Korea
Samuel K. Cho

DOI: https://doi.org/10.14245/ns.2347310.655
Journal volume & issue: Vol. 21, no. 1
pp. 128 – 146

Abstract

Read online

Objective Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT’s 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines. Methods ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy. Results Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT’s GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response. Conclusion ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model’s performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model’s responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.

Published in Neurospine

ISSN: 2586-6583 (Print); 2586-6591 (Online)
Publisher: Korean Spinal Neurosurgery Society
Country of publisher: Korea, Republic of
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry: Neurology. Diseases of the nervous system
Website: https://www.e-neurospine.org/

About the journal

Abstract

Keywords