Inherent Bias in Large Language Models: A Random Sampling Analysis

Noel F. Ayoub, MD, MBA; Karthik Balakrishnan, MD, MPH; Marc S. Ayoub, MD; Thomas F. Barrett, MD; Abel P. David, MD; Stacey T. Gray, MD

Mayo Clinic Proceedings: Digital Health (Jun 2024)

Inherent Bias in Large Language Models: A Random Sampling Analysis

Noel F. Ayoub, MD, MBA,
Karthik Balakrishnan, MD, MPH,
Marc S. Ayoub, MD,
Thomas F. Barrett, MD,
Abel P. David, MD,
Stacey T. Gray, MD

Affiliations

Noel F. Ayoub, MD, MBA: Division of Rhinology and Skull Base Surgery, Department of Otolaryngology--Head & Neck Surgery, Mass Eye and Ear/Harvard Medical School, Boston, MA; Correspondence: Address to Noel Ayoub, MD, MBA, Division of Rhinology and Skull Base Surgery, Department of Otolaryngology-Head and Neck Surgery, Mass Eye and Ear/Harvard Medical School, 243 Charles Street, Boston, MA 02114.
Karthik Balakrishnan, MD, MPH: Division of Pediatric Otolaryngology, Department of Otolaryngology-Head & Neck Surgery, Stanford University School of Medicine, Palo Alto, CA
Marc S. Ayoub, MD: Department of Neurosurgery, Lennox Hill, Northwell Health, New York, NY
Thomas F. Barrett, MD: Department of Otolaryngology-Head & Neck Surgery, Washington University in St. Louis, St. Louis, MO
Abel P. David, MD: Division of Otology and Neurotology, Mass Eye and Ear, Boston, MA
Stacey T. Gray, MD: Division of Rhinology and Skull Base Surgery, Department of Otolaryngology--Head & Neck Surgery, Mass Eye and Ear/Harvard Medical School, Boston, MA

Journal volume & issue: Vol. 2, no. 2
pp. 186 – 191

Abstract

Read online

There are mounting concerns regarding inherent bias, safety, and tendency toward misinformation of large language models (LLMs), which could have significant implications in health care. This study sought to determine whether generative artificial intelligence (AI)-based simulations of physicians making life-and-death decisions in a resource-scarce environment would demonstrate bias. Thirteen questions were developed that simulated physicians treating patients in resource-limited environments. Through a random sampling of simulated physicians using OpenAI’s generative pretrained transformer (GPT-4), physicians were tasked with choosing only 1 patient to save owing to limited resources. This simulation was repeated 1000 times per question, representing 1000 unique physicians and patients each. Patients and physicians spanned a variety of demographic characteristics. All patients had similar a priori likelihood of surviving the acute illness. Overall, simulated physicians consistently demonstrated racial, gender, age, political affiliation, and sexual orientation bias in clinical decision-making. Across all demographic characteristics, physicians most frequently favored patients with similar demographic characteristics as themselves, with most pairwise comparisons showing statistical significance (P<.05). Nondescript physicians favored White, male, and young demographic characteristics. The male doctor gravitated toward the male, White, and young, whereas the female doctor typically preferred female, young, and White patients. In addition to saving patients with their own political affiliation, Democratic physicians favored Black and female patients, whereas Republicans preferred White and male demographic characteristics. Heterosexual and gay/lesbian physicians frequently saved patients of similar sexual orientation. Overall, publicly available chatbot LLMs demonstrate significant biases, which may negatively impact patient outcomes if used to support clinical care decisions without appropriate precautions.

Published in Mayo Clinic Proceedings: Digital Health

ISSN: 2949-7612 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.sciencedirect.com/journal/mayo-clinic-proceedings-digital-health

About the journal