A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration

Juan M. Banda; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Ekaterina Artemova; Elena Tutubalina; Gerardo Chowell

doi:10.3390/epidemiologia2030024

Epidemiologia (Aug 2021)

A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration

Juan M. Banda,
Ramya Tekumalla,
Guanyu Wang,
Jingyuan Yu,
Tuo Liu,
Yuning Ding,
Ekaterina Artemova,
Elena Tutubalina,
Gerardo Chowell

Affiliations

Juan M. Banda: Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
Ramya Tekumalla: Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
Guanyu Wang: Missouri School of Journalism, University of Missouri, Columbia, MO 65201, USA
Jingyuan Yu: Department of Social Psychology, Universitat Autònoma de Barcelona, 08035 Barcelona, Spain
Tuo Liu: Department of Psychology, Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
Yuning Ding: Language Technology Lab, Universität Duisburg-Essen, 47057 Duisburg, Germany
Ekaterina Artemova: Faculty of Computer Science, Higher School of Economics—National Research University, 101000 Moscow, Russia
Elena Tutubalina: Faculty of Chemistry, Kazan Federal University, 420008 Kazan, Russia
Gerardo Chowell: Department of Population Health Sciences, Georgia State University, Atlanta, GA 30303, USA

DOI: https://doi.org/10.3390/epidemiologia2030024
Journal volume & issue: Vol. 2, no. 3
pp. 315 – 324

Abstract

Read online

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.

Published in Epidemiologia

ISSN: 2673-3986 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine
Website: https://www.mdpi.com/journal/epidemiologia

About the journal

Abstract

Keywords