Data in Brief (Jun 2020)

Dataset on dynamics of Coronavirus on Twitter

  • Norman Aguilar-Gallegos,
  • Leticia Elizabeth Romero-García,
  • Enrique Genaro Martínez-González,
  • Edgar Iván García-Sánchez,
  • Jorge Aguilar-Ávila

Journal volume & issue
Vol. 30
p. 105684

Abstract

Read online

In this data article, we provide a dataset of 8,982,694 Twitter posts around the coronavirus health global crisis. The data were collected through the Twitter REST API search. We used the rtweet R package to download raw data. The term searched was “Coronavirus” which included the word itself and its hashtag version. We collected the data over 23 days, from January 21 to February 12, 2020. The dataset is multilingual, prevailing English, Spanish, and Portuguese. We include a new variable created from other four variables; it is called “type” of tweets, which is useful for showing the diversity of tweets and the dynamics of users on Twitter. The dataset comprises seven databases which can be analysed separately. On the other hand, they can be crossed to set other researches, among them, trends and relevance of different topics, types of tweets, the embeddedness of users and their profiles, the retweets dynamics, hashtag analysis, as well as to perform social network analysis. This dataset can attract the attention of researchers related to different fields on knowledge, such as data science, social science, network science, health informatics, tourism, infodemiology, and others.

Keywords