npj Digital Medicine (Jun 2025)

Evaluating evidence-based health information from generative AI using a cross-sectional study with laypeople seeking screening information

  • Felix G. Rebitschek,
  • Alessandra Carella,
  • Silja Kohlrausch-Pazin,
  • Michael Zitzmann,
  • Anke Steckelberg,
  • Christoph Wilhelm

DOI
https://doi.org/10.1038/s41746-025-01752-6
Journal volume & issue
Vol. 8, no. 1
pp. 1 – 8

Abstract

Read online

Abstract Large language models (LLMs) are used to seek health information. Guidelines for evidence-based health communication require the presentation of the best available evidence to support informed decision-making. We investigate the prompt-dependent guideline compliance of LLMs and evaluate a minimal behavioural intervention for boosting laypeople’s prompting. Study 1 systematically varied prompt informedness, topic, and LLMs to evaluate compliance. Study 2 randomized 300 participants to three LLMs under standard or boosted prompting conditions. Blinded raters assessed LLM response with two instruments. Study 1 found that LLMs failed evidence-based health communication standards. The quality of responses was found to be contingent upon prompt informedness. Study 2 revealed that laypeople frequently generated poor-quality responses. The simple boost improved response quality, though it remained below required standards. These findings underscore the inadequacy of LLMs as a standalone health communication tool. Integrating LLMs with evidence-based frameworks, enhancing their reasoning and interfaces, and teaching prompting are essential. Study Registration: German Clinical Trials Register (DRKS) (Reg. No.: DRKS00035228, registered on 15 October 2024).