Measuring biases in AI-generated co-authorship networks

Ghazal Kalhor; Shiza Ali; Afra Mashhadi

doi:10.1140/epjds/s13688-025-00555-9

EPJ Data Science (May 2025)

Measuring biases in AI-generated co-authorship networks

Ghazal Kalhor,
Shiza Ali,
Afra Mashhadi

Affiliations

Ghazal Kalhor: School of Electrical and Computer Engineering, College of Engineering, University of Tehran
Shiza Ali: Computing and Software Systems, University of Washington
Afra Mashhadi: Computing and Software Systems, University of Washington

DOI: https://doi.org/10.1140/epjds/s13688-025-00555-9
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 33

Abstract

Read online

Abstract Large Language Models (LLMs) have significantly advanced prompt-based information retrieval, yet their potential to reproduce or amplify social biases remains insufficiently understood. In this study, we investigate this issue through the concrete task of reconstructing real-world co-authorship networks of computer science (CS) researchers using two widely used LLMs—GPT-3.5 Turbo and Mixtral 8x7B. This task offers a structured and quantifiable way to evaluate whether LLM-generated scholarly relationships reflect demographic disparities, as co-authorship is a key proxy for collaboration and recognition in academia. We compare the LLM-generated networks to baseline networks derived from DBLP and Google Scholar, employing both statistical and network science approaches to assess biases related to gender and ethnicity. Our findings show that both LLMs tend to produce more accurate co-authorship links for individuals with Asian or White names, particularly among researchers with lower visibility or limited academic impact. While we find no significant gender disparities in accuracy, the models systematically favor generating co-authorship links that overrepresent Asian and White individuals. Additionally, the structural properties of the LLM-generated networks differ from those of the baseline networks. These results highlight the importance of examining how LLMs represent social and scientific relationships, particularly in contexts where they are increasingly used for knowledge discovery and scholarly search.

Published in EPJ Data Science

ISSN: 2193-1127 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://www.epjdatascience.com/

About the journal

Abstract

Keywords