Bias and Unfairness in Machine Learning Models: A Systematic Review on Datasets, Tools, Fairness Metrics, and Identification and Mitigation Methods

Tiago P. Pagano; Rafael B. Loureiro; Fernanda V. N. Lisboa; Rodrigo M. Peixoto; Guilherme A. S. Guimarães; Gustavo O. R. Cruz; Maira M. Araujo; Lucas L. Santos; Marco A. S. Cruz; Ewerton L. S. Oliveira; Ingrid Winkler; Erick G. S. Nascimento

doi:10.3390/bdcc7010015

Big Data and Cognitive Computing (Jan 2023)

Bias and Unfairness in Machine Learning Models: A Systematic Review on Datasets, Tools, Fairness Metrics, and Identification and Mitigation Methods

Tiago P. Pagano,
Rafael B. Loureiro,
Fernanda V. N. Lisboa,
Rodrigo M. Peixoto,
Guilherme A. S. Guimarães,
Gustavo O. R. Cruz,
Maira M. Araujo,
Lucas L. Santos,
Marco A. S. Cruz,
Ewerton L. S. Oliveira,
Ingrid Winkler,
Erick G. S. Nascimento

Affiliations

Tiago P. Pagano: Computational Modeling Department, SENAI CIMATEC University Center, Salvador 41650-010, BA, Brazil
Rafael B. Loureiro: Computational Modeling Department, SENAI CIMATEC University Center, Salvador 41650-010, BA, Brazil
Fernanda V. N. Lisboa: Computing Engineering Department, SENAI CIMATEC University Center, Salvador 41650-010, BA, Brazil
Rodrigo M. Peixoto: Software Development Department, SENAI CIMATEC University Center, Salvador 41650-010, BA, Brazil
Guilherme A. S. Guimarães: Software Development Department, SENAI CIMATEC University Center, Salvador 41650-010, BA, Brazil
Gustavo O. R. Cruz: Software Development Department, SENAI CIMATEC University Center, Salvador 41650-010, BA, Brazil
Maira M. Araujo: Computing Engineering Department, SENAI CIMATEC University Center, Salvador 41650-010, BA, Brazil
Lucas L. Santos: Computational Modeling Department, SENAI CIMATEC University Center, Salvador 41650-010, BA, Brazil
Marco A. S. Cruz: HP Inc. Brazil R&D, Porto Alegre 90619-900, RS, Brazil
Ewerton L. S. Oliveira: HP Inc. Brazil R&D, Porto Alegre 90619-900, RS, Brazil
Ingrid Winkler: Management and Industrial Technology Department, SENAI CIMATEC University Center, Salvador 41650-010, BA, Brazil
Erick G. S. Nascimento: Computational Modeling Department, SENAI CIMATEC University Center, Salvador 41650-010, BA, Brazil

DOI: https://doi.org/10.3390/bdcc7010015
Journal volume & issue: Vol. 7, no. 1
p. 15

Abstract

Read online

One of the difficulties of artificial intelligence is to ensure that model decisions are fair and free of bias. In research, datasets, metrics, techniques, and tools are applied to detect and mitigate algorithmic unfairness and bias. This study examines the current knowledge on bias and unfairness in machine learning models. The systematic review followed the PRISMA guidelines and is registered on OSF plataform. The search was carried out between 2021 and early 2022 in the Scopus, IEEE Xplore, Web of Science, and Google Scholar knowledge bases and found 128 articles published between 2017 and 2022, of which 45 were chosen based on search string optimization and inclusion and exclusion criteria. We discovered that the majority of retrieved works focus on bias and unfairness identification and mitigation techniques, offering tools, statistical approaches, important metrics, and datasets typically used for bias experiments. In terms of the primary forms of bias, data, algorithm, and user interaction were addressed in connection to the preprocessing, in-processing, and postprocessing mitigation methods. The use of Equalized Odds, Opportunity Equality, and Demographic Parity as primary fairness metrics emphasizes the crucial role of sensitive attributes in mitigating bias. The 25 datasets chosen span a wide range of areas, including criminal justice image enhancement, finance, education, product pricing, and health, with the majority including sensitive attributes. In terms of tools, Aequitas is the most often referenced, yet many of the tools were not employed in empirical experiments. A limitation of current research is the lack of multiclass and multimetric studies, which are found in just a few works and constrain the investigation to binary-focused method. Furthermore, the results indicate that different fairness metrics do not present uniform results for a given use case, and that more research with varied model architectures is necessary to standardize which ones are more appropriate for a given context. We also observed that all research addressed the transparency of the algorithm, or its capacity to explain how decisions are taken.

Published in Big Data and Cognitive Computing

ISSN: 2504-2289 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology
Website: http://www.mdpi.com/journal/BDCC

About the journal

Abstract

Keywords