Technologies (Nov 2020)

Using Bias Parity Score to Find Feature-Rich Models with Least Relative Bias

  • Bhanu Jain,
  • Manfred Huber,
  • Ramez Elmasri,
  • Leonidas Fegaras

DOI
https://doi.org/10.3390/technologies8040068
Journal volume & issue
Vol. 8, no. 4
p. 68

Abstract

Read online

Machine learning-based decision support systems bring relief and support to the decision-maker in many domains such as loan application acceptance, dating, hiring, granting parole, insurance coverage, and medical diagnoses. These support systems facilitate processing tremendous amounts of data to decipher the patterns embedded in them. However, these decisions can also absorb and amplify bias embedded in the data. To address this, the work presented in this paper introduces a new fairness measure as well as an enhanced, feature-rich representation derived from the temporal aspects in the data set that permits the selection of the lowest bias model among the set of models learned on various versions of the augmented feature set. Specifically, our approach uses neural networks to forecast recidivism from many unique feature-rich models created from the same raw offender dataset. We create multiple records from one summarizing criminal record per offender in the raw dataset. This is achieved by grouping each set of arrest to release information into a unique record. We use offenders’ criminal history, substance abuse, and treatments taken during imprisonment in different numbers of past arrests to enrich the input feature vectors for the prediction models generated. We propose a fairness measure called Bias Parity (BP) score to measure quantifiable decrease in bias in the prediction models. BP score leverages an existing intuition of bias awareness and summarizes it in a single measure. We demonstrate how BP score can be used to quantify bias for a variety of statistical quantities and how to associate disparate impact with this measure. By using our feature enrichment approach we could increase the accuracy of predicting recidivism for the same dataset from 77.8% in another study to 89.2% in the current study while achieving an improved BP score computed for average accuracy of 99.4, where a value of 100 means no bias for the two subpopulation groups compared. Moreover, an analysis of the accuracy and BP scores for various levels of our feature augmentation method shows consistent trends among scores for a range of fairness measures, illustrating the benefit of the method for picking fairer models without significant loss of accuracy.

Keywords