Information (Mar 2023)
Extracting Self-Reported COVID-19 Symptom Tweets and Twitter Movement Mobility Origin/Destination Matrices to Inform Disease Models
Abstract
The emergence of the novel coronavirus (COVID-19) generated a need to quickly and accurately assemble up-to-date information related to its spread. In this research article, we propose two methods in which Twitter is useful when modelling the spread of COVID-19: (1) machine learning algorithms trained in English, Spanish, German, Portuguese and Italian are used to identify symptomatic individuals derived from Twitter. Using the geo-location attached to each tweet, we map users to a geographic location to produce a time-series of potential symptomatic individuals. We calibrate an extended SEIRD epidemiological model with combinations of low-latency data feeds, including the symptomatic tweets, with death data and infer the parameters of the model. We then evaluate the usefulness of the data feeds when making predictions of daily deaths in 50 US States, 16 Latin American countries, 2 European countries and 7 NHS (National Health Service) regions in the UK. We show that using symptomatic tweets can result in a 6% and 17% increase in mean squared error accuracy, on average, when predicting COVID-19 deaths in US States and the rest of the world, respectively, compared to using solely death data. (2) Origin/destination (O/D) matrices, for movements between seven NHS regions, are constructed by determining when a user has tweeted twice in a 24 h period in two different locations. We show that increasing and decreasing a social connectivity parameter within an SIR model affects the rate of spread of a disease.
Keywords