Assessing accuracy of ChatGPT in response to questions from day to day pharmaceutical care in hospitals

Merel van Nuland; Anne-Fleur H. Lobbezoo; Ewoudt M.W. van de Garde; Maikel Herbrink; Inger van Heijl; Tim Bognàr; Jeroen P.A. Houwen; Marloes Dekens; Demi Wannet; Toine Egberts; Paul D. van der Linden

Exploratory Research in Clinical and Social Pharmacy (Sep 2024)

Assessing accuracy of ChatGPT in response to questions from day to day pharmaceutical care in hospitals

Merel van Nuland,
Anne-Fleur H. Lobbezoo,
Ewoudt M.W. van de Garde,
Maikel Herbrink,
Inger van Heijl,
Tim Bognàr,
Jeroen P.A. Houwen,
Marloes Dekens,
Demi Wannet,
Toine Egberts,
Paul D. van der Linden

Affiliations

Merel van Nuland: Department of Clinical Pharmacy, Tergooi Medical Center, Hilversum, the Netherlands
Anne-Fleur H. Lobbezoo: Department of Clinical Pharmacy, Tergooi Medical Center, Hilversum, the Netherlands; Department of Pharmacy, St. Antonius Hospital, Utrecht, Nieuwegein, the Netherlands
Ewoudt M.W. van de Garde: Department of Pharmacy, St. Antonius Hospital, Utrecht, Nieuwegein, the Netherlands; Division of Pharmacoepidemiology and Clinical Pharmacology, Department of Pharmaceutical Sciences, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, Utrecht, the Netherlands
Maikel Herbrink: Department of Clinical Pharmacy, Meander Medical Center, Amersfoort, the Netherlands
Inger van Heijl: Department of Clinical Pharmacy, Tergooi Medical Center, Hilversum, the Netherlands
Tim Bognàr: Department of Clinical Pharmacy, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
Jeroen P.A. Houwen: Department of Clinical Pharmacy, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
Marloes Dekens: Department of Pharmacy, St. Antonius Hospital, Utrecht, Nieuwegein, the Netherlands
Demi Wannet: Department of Clinical Pharmacy, Meander Medical Center, Amersfoort, the Netherlands
Toine Egberts: Division of Pharmacoepidemiology and Clinical Pharmacology, Department of Pharmaceutical Sciences, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, Utrecht, the Netherlands; Department of Clinical Pharmacy, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
Paul D. van der Linden: Department of Clinical Pharmacy, Tergooi Medical Center, Hilversum, the Netherlands; Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands; Corresponding author at: Laan van Tergooi 2, 1212 VG, Hilversum, the Netherlands.

Journal volume & issue: Vol. 15
p. 100464

Abstract

Read online

Background: The advent of Large Language Models (LLMs) such as ChatGPT introduces opportunities within the medical field. Nonetheless, use of LLM poses a risk when healthcare practitioners and patients present clinical questions to these programs without a comprehensive understanding of its suitability for clinical contexts. Objective: The objective of this study was to assess ChatGPT's ability to generate appropriate responses to clinical questions that hospital pharmacists could encounter during routine patient care. Methods: Thirty questions from 10 different domains within clinical pharmacy were collected during routine care. Questions were presented to ChatGPT in a standardized format, including patients' age, sex, drug name, dose, and indication. Subsequently, relevant information regarding specific cases were provided, and the prompt was concluded with the query “what would a hospital pharmacist do?”. The impact on accuracy was assessed for each domain by modifying personification to “what would you do?”, presenting the question in Dutch, and regenerating the primary question. All responses were independently evaluated by two senior hospital pharmacists, focusing on the availability of an advice, accuracy and concordance. Results: In 77% of questions, ChatGPT provided an advice in response to the question. For these responses, accuracy and concordance were determined. Accuracy was correct and complete for 26% of responses, correct but incomplete for 22% of responses, partially correct and partially incorrect for 30% of responses and completely incorrect for 22% of responses. The reproducibility was poor, with merely 10% of responses remaining consistent upon regeneration of the primary question. Conclusions: While concordance of responses was excellent, the accuracy and reproducibility were poor. With the described method, ChatGPT should not be used to address questions encountered by hospital pharmacists during their shifts. However, it is important to acknowledge the limitations of our methodology, including potential biases, which may have influenced the findings.

Published in Exploratory Research in Clinical and Social Pharmacy

ISSN: 2667-2766 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Pharmacy and materia medica
Website: https://www.journals.elsevier.com/exploratory-research-in-clinical-and-social-pharmacy

About the journal

Abstract

Keywords