The myth of reproducibility: A review of event tracking evaluations on Twitter

Nicholas Mamo; Joel Azzopardi; Colin Layfield

doi:10.3389/fdata.2023.1067335

Frontiers in Big Data (Apr 2023)

The myth of reproducibility: A review of event tracking evaluations on Twitter

Nicholas Mamo,
Joel Azzopardi,
Colin Layfield

Affiliations

Nicholas Mamo: Department of Artificial Intelligence, Faculty of Information and Communication Technology, University of Malta, Msida, Malta
Joel Azzopardi: Department of Artificial Intelligence, Faculty of Information and Communication Technology, University of Malta, Msida, Malta
Colin Layfield: Department of Computer Information Systems, Faculty of Information and Communication Technology, University of Malta, Msida, Malta

DOI: https://doi.org/10.3389/fdata.2023.1067335
Journal volume & issue: Vol. 6

Abstract

Read online

Event tracking literature based on Twitter does not have a state-of-the-art. What it does have is a plethora of manual evaluation methodologies and inventive automatic alternatives: incomparable and irreproducible studies incongruous with the idea of a state-of-the-art. Many researchers blame Twitter's data sharing policy for the lack of common datasets and a universal ground truth–for the lack of reproducibility–but many other issues stem from the conscious decisions of those same researchers. In this paper, we present the most comprehensive review yet on event tracking literature's evaluations on Twitter. We explore the challenges of manual experiments, the insufficiencies of automatic analyses and the misguided notions on reproducibility. Crucially, we discredit the widely-held belief that reusing tweet datasets could induce reproducibility. We reveal how tweet datasets self-sanitize over time; how spam and noise become unavailable at much higher rates than legitimate content, rendering downloaded datasets incomparable with the original. Nevertheless, we argue that Twitter's policy can be a hindrance without being an insurmountable barrier, and propose how the research community can make its evaluations more reproducible. A state-of-the-art remains attainable for event tracking research.

Published in Frontiers in Big Data

ISSN: 2624-909X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.frontiersin.org/journals/big-data

About the journal

Abstract

Keywords