Reducing Bias in Sentiment Analysis Models Through Causal Mediation Analysis and Targeted Counterfactual Training

Yifei Da; Matias Nicolas Bossa; Abel Diaz Berenguer; Hichem Sahli

doi:10.1109/ACCESS.2024.3353056

IEEE Access (Jan 2024)

Reducing Bias in Sentiment Analysis Models Through Causal Mediation Analysis and Targeted Counterfactual Training

Yifei Da,
Matias Nicolas Bossa,
Abel Diaz Berenguer,
Hichem Sahli

Affiliations

Yifei Da: ORCiD; Department of Electronics and Informatics, Vrije Universiteit Brussel, Brussels, Belgium
Matias Nicolas Bossa: ORCiD; Department of Electronics and Informatics, Vrije Universiteit Brussel, Brussels, Belgium
Abel Diaz Berenguer: ORCiD; Department of Electronics and Informatics, Vrije Universiteit Brussel, Brussels, Belgium
Hichem Sahli: ORCiD; Department of Electronics and Informatics, Vrije Universiteit Brussel, Brussels, Belgium

DOI: https://doi.org/10.1109/ACCESS.2024.3353056
Journal volume & issue: Vol. 12
pp. 10120 – 10134

Abstract

Read online

Large language models provide high-accuracy solutions in many natural language processing tasks. In particular, they are used as word embeddings in sentiment analysis models. However, these models pick up on and amplify biases and social stereotypes in the data. Causality theory has recently driven the development of effective algorithms to evaluate and mitigate these biases. Causal mediation was used to detect biases, while counterfactual training was proposed to mitigate bias. In both cases, counterfactual sentences are created by changing an attribute, such as the gender of a noun, for which no change in the model output is expected. Biases are detected and eventually corrected each time the model behavior differs between the original and the counterfactual sentence. We propose a new method for de-biasing sentiment analysis models that leverages the causal mediation analysis to identify the parts of the model primarily responsible for the bias and apply targeted counterfactual training for model de-biasing. We validated the methodology by fine-tuning the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model for sentiment prediction. We trained two sentiment analysis models using the Stanford Sentiment Treebank dataset and the Amazon Product Reviews, respectively, and we evaluated the fairness and prediction performances using the Equity Evaluation Corpus. We illustrated the causal patterns in the network and showed that our method achieves both high fairness and more accurate sentiment analysis than the state-of-the-art approach. Contrary to state-of-the-art models, we achieved a noticeable improvement in gender fairness without hindering sentiment prediction accuracy.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords