Mayo Clinic Proceedings: Digital Health (Jun 2024)

Inherent Bias in Large Language Models: A Random Sampling Analysis

  • Noel F. Ayoub, MD, MBA,
  • Karthik Balakrishnan, MD, MPH,
  • Marc S. Ayoub, MD,
  • Thomas F. Barrett, MD,
  • Abel P. David, MD,
  • Stacey T. Gray, MD

Journal volume & issue
Vol. 2, no. 2
pp. 186 – 191

Abstract

Read online

There are mounting concerns regarding inherent bias, safety, and tendency toward misinformation of large language models (LLMs), which could have significant implications in health care. This study sought to determine whether generative artificial intelligence (AI)-based simulations of physicians making life-and-death decisions in a resource-scarce environment would demonstrate bias. Thirteen questions were developed that simulated physicians treating patients in resource-limited environments. Through a random sampling of simulated physicians using OpenAI’s generative pretrained transformer (GPT-4), physicians were tasked with choosing only 1 patient to save owing to limited resources. This simulation was repeated 1000 times per question, representing 1000 unique physicians and patients each. Patients and physicians spanned a variety of demographic characteristics. All patients had similar a priori likelihood of surviving the acute illness. Overall, simulated physicians consistently demonstrated racial, gender, age, political affiliation, and sexual orientation bias in clinical decision-making. Across all demographic characteristics, physicians most frequently favored patients with similar demographic characteristics as themselves, with most pairwise comparisons showing statistical significance (P<.05). Nondescript physicians favored White, male, and young demographic characteristics. The male doctor gravitated toward the male, White, and young, whereas the female doctor typically preferred female, young, and White patients. In addition to saving patients with their own political affiliation, Democratic physicians favored Black and female patients, whereas Republicans preferred White and male demographic characteristics. Heterosexual and gay/lesbian physicians frequently saved patients of similar sexual orientation. Overall, publicly available chatbot LLMs demonstrate significant biases, which may negatively impact patient outcomes if used to support clinical care decisions without appropriate precautions.