Controlling Bias Between Categorical Attributes in Datasets: A Two-Step Optimization Algorithm Leveraging Structural Equation Modeling

Enrico Barbierato; Andrea Pozzi; Daniele Tessera

doi:10.1109/ACCESS.2023.3325235

IEEE Access (Jan 2023)

Controlling Bias Between Categorical Attributes in Datasets: A Two-Step Optimization Algorithm Leveraging Structural Equation Modeling

Enrico Barbierato,
Andrea Pozzi,
Daniele Tessera

Affiliations

Enrico Barbierato: ORCiD; Faculty of Mathematical, Physical and Natural Sciences, Catholic University of Sacred Heart, Brescia, Italy
Andrea Pozzi: ORCiD; Faculty of Mathematical, Physical and Natural Sciences, Catholic University of Sacred Heart, Brescia, Italy
Daniele Tessera: ORCiD; Faculty of Mathematical, Physical and Natural Sciences, Catholic University of Sacred Heart, Brescia, Italy

DOI: https://doi.org/10.1109/ACCESS.2023.3325235
Journal volume & issue: Vol. 11
pp. 115493 – 115510

Abstract

Read online

In the realm of data-driven systems, understanding and controlling biases in datasets emerges as a critical challenge. These biases, defined in this study as systematic discrepancies, have the potential to skew algorithmic outcomes and even compromise data privacy. Mutual information serves as a key tool in the analysis, discerning both direct and indirect relationships between variables. Utilizing structural equation modeling, this paper introduces a synthetic dataset generation method founded on a two-step optimization algorithm that aims to fine-tune variable relationships and achieve targeted mutual information levels between attribute pairs. The algorithm’s first phase utilizes gradient-less optimization, focusing on individual variables. The subsequent phase harnesses gradient-based methods to unravel deeper variable interdependencies. The approach is dual-purpose: it refines existing datasets for bias mitigation and creates synthetic datasets with defined bias levels, addressing a crucial research gap. Two case studies showcase the methodology. One emphasizes the finesse of network parameter adjustments in a simulated setting. The other applies the methodology to a realistic job hiring dataset, effectively reducing bias while safeguarding key variable relationships. In summary, this paper offers a novel method for bias management, presents tools for quantitative bias adjustments, and provides evidence of the method’s broad applicability through varied use cases.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords