A strategy for cost-effective large language model use at health system-scale

Eyal Klang; Donald Apakama; Ethan E. Abbott; Akhil Vaid; Joshua Lampert; Ankit Sakhuja; Robert Freeman; Alexander W. Charney; David Reich; Monica Kraft; Girish N. Nadkarni; Benjamin S. Glicksberg

doi:10.1038/s41746-024-01315-1

npj Digital Medicine (Nov 2024)

A strategy for cost-effective large language model use at health system-scale

Eyal Klang,
Donald Apakama,
Ethan E. Abbott,
Akhil Vaid,
Joshua Lampert,
Ankit Sakhuja,
Robert Freeman,
Alexander W. Charney,
David Reich,
Monica Kraft,
Girish N. Nadkarni,
Benjamin S. Glicksberg

Affiliations

Eyal Klang: Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai
Donald Apakama: Department of Emergency Medicine, Icahn School of Medicine at Mount Sinai
Ethan E. Abbott: Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai
Akhil Vaid: Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai
Joshua Lampert: Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai
Ankit Sakhuja: Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai
Robert Freeman: The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai
Alexander W. Charney: The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai
David Reich: Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai
Monica Kraft: The Samuel Bronfman Department of Medicine, Icahn School of Medicine at Mount Sinai
Girish N. Nadkarni: Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai
Benjamin S. Glicksberg: Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai

DOI: https://doi.org/10.1038/s41746-024-01315-1
Journal volume & issue: Vol. 7, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Large language models (LLMs) can optimize clinical workflows; however, the economic and computational challenges of their utilization at the health system scale are underexplored. We evaluated how concatenating queries with multiple clinical notes and tasks simultaneously affects model performance under increasing computational loads. We assessed ten LLMs of different capacities and sizes utilizing real-world patient data. We conducted >300,000 experiments of various task sizes and configurations, measuring accuracy in question-answering and the ability to properly format outputs. Performance deteriorated as the number of questions and notes increased. High-capacity models, like Llama-3–70b, had low failure rates and high accuracies. GPT-4-turbo-128k was similarly resilient across task burdens, but performance deteriorated after 50 tasks at large prompt sizes. After addressing mitigable failures, these two models can concatenate up to 50 simultaneous tasks effectively, with validation on a public medical question-answering dataset. An economic analysis demonstrated up to a 17-fold cost reduction at 50 tasks using concatenation. These results identify the limits of LLMs for effective utilization and highlight avenues for cost-efficiency at the enterprise scale.

Published in npj Digital Medicine

ISSN: 2398-6352 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.nature.com/npjdigitalmed/

About the journal