The Sociodemographic Biases in Machine Learning Algorithms: A Biomedical Informatics Perspective

Gillian Franklin; Rachel Stephens; Muhammad Piracha; Shmuel Tiosano; Frank Lehouillier; Ross Koppel; Peter L. Elkin

doi:10.3390/life14060652

Life (May 2024)

The Sociodemographic Biases in Machine Learning Algorithms: A Biomedical Informatics Perspective

Gillian Franklin,
Rachel Stephens,
Muhammad Piracha,
Shmuel Tiosano,
Frank Lehouillier,
Ross Koppel,
Peter L. Elkin

Affiliations

Gillian Franklin: Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA
Rachel Stephens: Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA
Muhammad Piracha: Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA
Shmuel Tiosano: Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA
Frank Lehouillier: Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA
Ross Koppel: Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA
Peter L. Elkin: Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA

DOI: https://doi.org/10.3390/life14060652
Journal volume & issue: Vol. 14, no. 6
p. 652

Abstract

Read online

Artificial intelligence models represented in machine learning algorithms are promising tools for risk assessment used to guide clinical and other health care decisions. Machine learning algorithms, however, may house biases that propagate stereotypes, inequities, and discrimination that contribute to socioeconomic health care disparities. The biases include those related to some sociodemographic characteristics such as race, ethnicity, gender, age, insurance, and socioeconomic status from the use of erroneous electronic health record data. Additionally, there is concern that training data and algorithmic biases in large language models pose potential drawbacks. These biases affect the lives and livelihoods of a significant percentage of the population in the United States and globally. The social and economic consequences of the associated backlash cannot be underestimated. Here, we outline some of the sociodemographic, training data, and algorithmic biases that undermine sound health care risk assessment and medical decision-making that should be addressed in the health care system. We present a perspective and overview of these biases by gender, race, ethnicity, age, historically marginalized communities, algorithmic bias, biased evaluations, implicit bias, selection/sampling bias, socioeconomic status biases, biased data distributions, cultural biases and insurance status bias, conformation bias, information bias and anchoring biases and make recommendations to improve large language model training data, including de-biasing techniques such as counterfactual role-reversed sentences during knowledge distillation, fine-tuning, prefix attachment at training time, the use of toxicity classifiers, retrieval augmented generation and algorithmic modification to mitigate the biases moving forward.

Published in Life

ISSN: 2075-1729 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/life

About the journal

Abstract

Keywords