Mitigating Large Language Model Bias: Automated Dataset Augmentation and Prejudice Quantification

Devam Mondal; Carlo Lipizzi

doi:10.3390/computers13060141

Computers (Jun 2024)

Mitigating Large Language Model Bias: Automated Dataset Augmentation and Prejudice Quantification

Devam Mondal,
Carlo Lipizzi

Affiliations

Devam Mondal: Center for Complex Systems and Enterprises, Stevens Institute of Technology, Hoboken, NJ 07030, USA
Carlo Lipizzi: Center for Complex Systems and Enterprises, Stevens Institute of Technology, Hoboken, NJ 07030, USA

DOI: https://doi.org/10.3390/computers13060141
Journal volume & issue: Vol. 13, no. 6
p. 141

Abstract

Read online

Despite the growing capabilities of large language models, concerns exist about the biases they develop. In this paper, we propose a novel, automated mechanism for debiasing through specified dataset augmentation in the lens of bias producers that can be useful in a variety of industries, especially ones that are “restricted” and have limited data. We consider that bias can occur due to intrinsic model architecture and dataset quality. The two aspects are evaluated using two different metrics we created. We show that our dataset augmentation algorithm reduces bias as measured by our metrics. Our code can be found on an online GitHub repository.

Published in Computers

ISSN: 2073-431X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.mdpi.com/journal/computers

About the journal

Abstract

Keywords