Research note: Examining potential bias in large-scale censored data

Jennifer Allen; Markus Mobius; David M. Rothschild; Duncan J. Watts

doi:10.37016/mr-2020-74

Harvard Kennedy School Misinformation Review (Jul 2021)

Research note: Examining potential bias in large-scale censored data

Jennifer Allen,
Markus Mobius,
David M. Rothschild,
Duncan J. Watts

Affiliations

Jennifer Allen: Sloan School of Management, Massachusetts Institute of Technology, USA
Markus Mobius: Microsoft Research, USA
David M. Rothschild: Microsoft Research, USA
Duncan J. Watts: Department of Computer and Information Science, University of Pennsylvania

DOI: https://doi.org/10.37016/mr-2020-74
Journal volume & issue: Vol. 2, no. 4

Abstract

Read online

We examine potential bias in Facebook’s 10-trillion cell URLs dataset, consisting of URLs shared on its platform and their engagement metrics. Despite the unprecedented size of the dataset, it was altered to protect user privacy in two ways: 1) by adding differentially private noise to engagement counts, and 2) by censoring the data with a 100-public-share threshold for a URL’s inclusion. To understand how these alterations affect conclusions drawn from the data, we estimate the prevalence of fake news in the massive, censored URLs dataset and compare it to an estimate from a smaller, representative dataset. We show that censoring can substantially alter conclusions that are drawn from the Facebook dataset. Because of this 100-public-share threshold, descriptive statistics from the Facebook URLs dataset overestimate the share of fake news and news overall by as much as 4X. We conclude with more general implications for censoring data.

Published in Harvard Kennedy School Misinformation Review

ISSN: 2766-1652 (Online)
Publisher: Harvard Kennedy School
Country of publisher: United States
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Language and Literature: Philology. Linguistics: Communication. Mass media
Website: https://misinforeview.hks.harvard.edu

About the journal

Abstract

Keywords