A Methodology for Controlling Bias and Fairness in Synthetic Data Generation

Enrico Barbierato; Marco L. Della Vedova; Daniele Tessera; Daniele Toti; Nicola Vanoli

doi:10.3390/app12094619

Applied Sciences (May 2022)

A Methodology for Controlling Bias and Fairness in Synthetic Data Generation

Enrico Barbierato,
Marco L. Della Vedova,
Daniele Tessera,
Daniele Toti,
Nicola Vanoli

Affiliations

Enrico Barbierato: Faculty of Mathematical, Physical and Natural Sciences, Catholic University of the Sacred Heart, 25121 Brescia, Italy
Marco L. Della Vedova: Department of Management, Information and Production Engineering, University of Bergamo, 24129 Bergamo, Italy
Daniele Tessera: Faculty of Mathematical, Physical and Natural Sciences, Catholic University of the Sacred Heart, 25121 Brescia, Italy
Daniele Toti: Faculty of Mathematical, Physical and Natural Sciences, Catholic University of the Sacred Heart, 25121 Brescia, Italy
Nicola Vanoli: Faculty of Mathematical, Physical and Natural Sciences, Catholic University of the Sacred Heart, 25121 Brescia, Italy

DOI: https://doi.org/10.3390/app12094619
Journal volume & issue: Vol. 12, no. 9
p. 4619

Abstract

Read online

The development of algorithms, based on machine learning techniques, supporting (or even replacing) human judgment must take into account concepts such as data bias and fairness. Though scientific literature proposes numerous techniques to detect and evaluate these problems, less attention has been dedicated to methods generating intentionally biased datasets, which could be used by data scientists to develop and validate unbiased and fair decision-making algorithms. To this end, this paper presents a novel method to generate a synthetic dataset, where bias can be modeled by using a probabilistic network exploiting structural equation modeling. The proposed methodology has been validated on a simple dataset to highlight the impact of tuning parameters on bias and fairness, as well as on a more realistic example based on a loan approval status dataset. In particular, this methodology requires a limited number of parameters compared to other techniques for generating datasets with a controlled amount of bias and fairness.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords