Natural Language Processing Journal (Sep 2024)
Contrastive adversarial gender debiasing
Abstract
This research contributes a comprehensive analysis of gender bias within contemporary AI language models, specifically examining iterations of the GPT series, alongside Gemini and Llama. The study offers a systematic investigation, encompassing multiple experiments spanning sentence completions, generative narratives, bilingual analysis, and visual perception assessments. The primary objective is to scrutinize the evolution of gender bias in these models across iterations, explore biases in professions and contexts, and evaluate multilingual disparities. Notably, the analyses reveal a marked evolution in GPT iterations, with GPT4 showcasing significantly reduced or negligible biases, signifying substantial advancements in bias mitigation. Professions and contexts exhibit model biases, indicating associations with specific genders. Multilingual evaluations demonstrate subtle disparities in gender bias tendencies between English and Spanish narratives. To effectively mitigate these biases, we propose a novel Contrastive Adversarial Gender Debiasing (CAGD) method that synergistically combines contrastive learning and adversarial training techniques. The CAGD method enables language models to learn gender-neutral representations while promoting robustness against gender biases, consistently outperforming original and adversarially debiased models across various tasks and metrics. These findings underscore the complexity of gender bias in AI language models, emphasizing the need for continual bias mitigation strategies, such as the proposed CAGD approach, and ethical considerations in AI development and deployment.