Red teaming ChatGPT in medicine to yield real-world insights on model behavior

Crystal T. Chang; Hodan Farah; Haiwen Gui; Shawheen Justin Rezaei; Charbel Bou-Khalil; Ye-Jean Park; Akshay Swaminathan; Jesutofunmi A. Omiye; Akaash Kolluri; Akash Chaurasia; Alejandro Lozano; Alice Heiman; Allison Sihan Jia; Amit Kaushal; Angela Jia; Angelica Iacovelli; Archer Yang; Arghavan Salles; Arpita Singhal; Balasubramanian Narasimhan; Benjamin Belai; Benjamin H. Jacobson; Binglan Li; Celeste H. Poe; Chandan Sanghera; Chenming Zheng; Conor Messer; Damien Varid Kettud; Deven Pandya; Dhamanpreet Kaur; Diana Hla; Diba Dindoust; Dominik Moehrle; Duncan Ross; Ellaine Chou; Eric Lin; Fateme Nateghi Haredasht; Ge Cheng; Irena Gao; Jacob Chang; Jake Silberg; Jason A. Fries; Jiapeng Xu; Joe Jamison; John S. Tamaresis; Jonathan H. Chen; Joshua Lazaro; Juan M. Banda; Julie J. Lee; Karen Ebert Matthys; Kirsten R. Steffner; Lu Tian; Luca Pegolotti; Malathi Srinivasan; Maniragav Manimaran; Matthew Schwede; Minghe Zhang; Minh Nguyen; Mohsen Fathzadeh; Qian Zhao; Rika Bajra; Rohit Khurana; Ruhana Azam; Rush Bartlett; Sang T. Truong; Scott L. Fleming; Shriti Raj; Solveig Behr; Sonia Onyeka; Sri Muppidi; Tarek Bandali; Tiffany Y. Eulalio; Wenyuan Chen; Xuanyu Zhou; Yanan Ding; Ying Cui; Yuqi Tan; Yutong Liu; Nigam Shah; Roxana Daneshjou

doi:10.1038/s41746-025-01542-0

npj Digital Medicine (Mar 2025)

Red teaming ChatGPT in medicine to yield real-world insights on model behavior

Crystal T. Chang,
Hodan Farah,
Haiwen Gui,
Shawheen Justin Rezaei,
Charbel Bou-Khalil,
Ye-Jean Park,
Akshay Swaminathan,
Jesutofunmi A. Omiye,
Akaash Kolluri,
Akash Chaurasia,
Alejandro Lozano,
Alice Heiman,
Allison Sihan Jia,
Amit Kaushal,
Angela Jia,
Angelica Iacovelli,
Archer Yang,
Arghavan Salles,
Arpita Singhal,
Balasubramanian Narasimhan,
Benjamin Belai,
Benjamin H. Jacobson,
Binglan Li,
Celeste H. Poe,
Chandan Sanghera,
Chenming Zheng,
Conor Messer,
Damien Varid Kettud,
Deven Pandya,
Dhamanpreet Kaur,
Diana Hla,
Diba Dindoust,
Dominik Moehrle,
Duncan Ross,
Ellaine Chou,
Eric Lin,
Fateme Nateghi Haredasht,
Ge Cheng,
Irena Gao,
Jacob Chang,
Jake Silberg,
Jason A. Fries,
Jiapeng Xu,
Joe Jamison,
John S. Tamaresis,
Jonathan H. Chen,
Joshua Lazaro,
Juan M. Banda,
Julie J. Lee,
Karen Ebert Matthys,
Kirsten R. Steffner,
Lu Tian,
Luca Pegolotti,
Malathi Srinivasan,
Maniragav Manimaran,
Matthew Schwede,
Minghe Zhang,
Minh Nguyen,
Mohsen Fathzadeh,
Qian Zhao,
Rika Bajra,
Rohit Khurana,
Ruhana Azam,
Rush Bartlett,
Sang T. Truong,
Scott L. Fleming,
Shriti Raj,
Solveig Behr,
Sonia Onyeka,
Sri Muppidi,
Tarek Bandali,
Tiffany Y. Eulalio,
Wenyuan Chen,
Xuanyu Zhou,
Yanan Ding,
Ying Cui,
Yuqi Tan,
Yutong Liu,
Nigam Shah,
Roxana Daneshjou

Affiliations

Crystal T. Chang: Department of Dermatology, Stanford University
Hodan Farah: Department of Dermatology, Stanford University
Haiwen Gui: Department of Dermatology, Stanford University
Shawheen Justin Rezaei: School of Medicine, Stanford University
Charbel Bou-Khalil: School of Medicine, Stanford University
Ye-Jean Park: Temerty Faculty of Medicine
Akshay Swaminathan: School of Medicine, Stanford University
Jesutofunmi A. Omiye: Department of Dermatology, Stanford University
Akaash Kolluri: Stanford University
Akash Chaurasia: Department of Computer Science, Stanford University
Alejandro Lozano: Department of Biomedical Data Science, Stanford University
Alice Heiman: Stanford University
Allison Sihan Jia: Stanford University
Amit Kaushal: Department of Bioengineering, Stanford University
Angela Jia: Stanford University
Angelica Iacovelli: Department of Pediatrics, Stanford University
Archer Yang: Department of Biomedical Data Science, Stanford University
Arghavan Salles: Stanford University
Arpita Singhal: Department of Computer Science, Stanford University
Balasubramanian Narasimhan: Stanford University
Benjamin Belai: Department of Psychiatry, Stanford University
Benjamin H. Jacobson: School of Medicine, Stanford University
Binglan Li: Department of Biomedical Data Science, Stanford University
Celeste H. Poe: School of Medicine, Stanford University
Chandan Sanghera: Stanford University
Chenming Zheng: School of Medicine, Stanford University
Conor Messer: Stanford University
Damien Varid Kettud: Stanford University
Deven Pandya: Stanford University
Dhamanpreet Kaur: School of Medicine, Stanford University
Diana Hla: Mayo Clinic Alix School of Medicine
Diba Dindoust: Stanford University
Dominik Moehrle: School of Medicine, Stanford University
Duncan Ross: Department of Statistics, Stanford University
Ellaine Chou: Department of Biomedical Data Science, Stanford University
Eric Lin: Veterans Affairs Medical Center
Fateme Nateghi Haredasht: Center for Biomedical Informatics Research, Stanford University
Ge Cheng: Department of Biomedical Data Science, Stanford University
Irena Gao: Stanford University
Jacob Chang: Department of Biomedical Data Science, Stanford University
Jake Silberg: Department of Biomedical Data Science, Stanford University
Jason A. Fries: Center for Biomedical Informatics Research, Stanford University
Jiapeng Xu: Department of Biomedical Data Science, Stanford University
Joe Jamison: Department of Statistics, Stanford University
John S. Tamaresis: Department of Biomedical Data Science, Stanford University
Jonathan H. Chen: Clinical Excellence Research Center, School of Medicine, Stanford University
Joshua Lazaro: Department of Biomedical Data Science, Stanford University
Juan M. Banda: Technology and Digital Solutions, Stanford Health Care
Julie J. Lee: Department of Pediatrics, Stanford University
Karen Ebert Matthys: Department of Biomedical Data Science, Stanford University
Kirsten R. Steffner: Department of Anesthesiology, Stanford University
Lu Tian: Stanford University
Luca Pegolotti: Department of Pediatrics, Stanford University
Malathi Srinivasan: School of Medicine, Stanford University
Maniragav Manimaran: Graduate School of Business, Stanford University
Matthew Schwede: Department of Medicine, Stanford University
Minghe Zhang: Department of Statistics, Stanford University
Minh Nguyen: Stanford University
Mohsen Fathzadeh: Department of Epidemiology and Population Health, Stanford University
Qian Zhao: Department of Biomedical Data Science, Stanford University
Rika Bajra: School of Medicine, Stanford University
Rohit Khurana: Department of Biomedical Data Science, Stanford University
Ruhana Azam: Stanford University
Rush Bartlett: Stanford BioDesign, Stanford University
Sang T. Truong: Department of Computer Science, Stanford University
Scott L. Fleming: Department of Biomedical Data Science, Stanford University
Shriti Raj: Center for Biomedical Informatics Research, Stanford University
Solveig Behr: Department of Education and Psychology, Freie Universität Berlin
Sonia Onyeka: Department of Dermatology, Stanford University
Sri Muppidi: Stanford University
Tarek Bandali: Stanford University
Tiffany Y. Eulalio: Department of Biomedical Data Science, Stanford University
Wenyuan Chen: Department of Biomedical Data Science, Stanford University
Xuanyu Zhou: Department of Epidemiology and Population Health, Stanford University
Yanan Ding: Department of Biomedical Data Science, Stanford University
Ying Cui: Stanford University
Yuqi Tan: Department of Pathology, Stanford University
Yutong Liu: Department of Epidemiology and Population Health, Stanford University
Nigam Shah: School of Medicine, Stanford University
Roxana Daneshjou: Department of Dermatology, Stanford University

DOI: https://doi.org/10.1038/s41746-025-01542-0
Journal volume & issue: Vol. 8, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Red teaming, the practice of adversarially exposing unexpected or undesired model behaviors, is critical towards improving equity and accuracy of large language models, but non-model creator-affiliated red teaming is scant in healthcare. We convened teams of clinicians, medical and engineering students, and technical professionals (80 participants total) to stress-test models with real-world clinical cases and categorize inappropriate responses along axes of safety, privacy, hallucinations/accuracy, and bias. Six medically-trained reviewers re-analyzed prompt-response pairs and added qualitative annotations. Of 376 unique prompts (1504 responses), 20.1% were inappropriate (GPT-3.5: 25.8%; GPT-4.0: 16%; GPT-4.0 with Internet: 17.8%). Subsequently, we show the utility of our benchmark by testing GPT-4o, a model released after our event (20.4% inappropriate). 21.5% of responses appropriate with GPT-3.5 were inappropriate in updated models. We share insights for constructing red teaming prompts, and present our benchmark for iterative model assessments.

Published in npj Digital Medicine

ISSN: 2398-6352 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.nature.com/npjdigitalmed/

About the journal