JMIR Public Health and Surveillance (Sep 2021)
Tracking Self-reported Symptoms and Medical Conditions on Social Media During the COVID-19 Pandemic: Infodemiological Study
Abstract
BackgroundHarnessing health-related data posted on social media in real time can offer insights into how the pandemic impacts the mental health and general well-being of individuals and populations over time. ObjectiveThis study aimed to obtain information on symptoms and medical conditions self-reported by non-Twitter social media users during the COVID-19 pandemic, to determine how discussion of these symptoms and medical conditions changed over time, and to identify correlations between frequency of the top 5 commonly mentioned symptoms post and daily COVID-19 statistics (new cases, new deaths, new active cases, and new recovered cases) in the United States. MethodsWe used natural language processing (NLP) algorithms to identify symptom- and medical condition–related topics being discussed on social media between June 14 and December 13, 2020. The sample posts were geotagged by NetBase, a third-party data provider. We calculated the positive predictive value and sensitivity to validate the classification of posts. We also assessed the frequency of health-related discussions on social media over time during the study period, and used Pearson correlation coefficients to identify statistically significant correlations between the frequency of the 5 most commonly mentioned symptoms and fluctuation of daily US COVID-19 statistics. ResultsWithin a total of 9,807,813 posts (nearly 70% were sourced from the United States), we identified a discussion of 120 symptom-related topics and 1542 medical condition–related topics. Our classification of the health-related posts had a positive predictive value of over 80% and an average classification rate of 92% sensitivity. The 5 most commonly mentioned symptoms on social media during the study period were anxiety (in 201,303 posts or 12.2% of the total posts mentioning symptoms), generalized pain (189,673, 11.5%), weight loss (95,793, 5.8%), fatigue (91,252, 5.5%), and coughing (86,235, 5.2%). The 5 most discussed medical conditions were COVID-19 (in 5,420,276 posts or 66.4% of the total posts mentioning medical conditions), unspecified infectious disease (469,356, 5.8%), influenza (270,166, 3.3%), unspecified disorders of the central nervous system (253,407, 3.1%), and depression (151,752, 1.9%). Changes in posts in the frequency of anxiety, generalized pain, and weight loss were significant but negatively correlated with daily new COVID-19 cases in the United States (r=-0.49, r=-0.46, and r=-0.39, respectively; P<.05). Posts on the frequency of anxiety, generalized pain, weight loss, fatigue, and the changes in fatigue positively and significantly correlated with daily changes in both new deaths and new active cases in the United States (r ranged=0.39-0.48; P<.05). ConclusionsCOVID-19 and symptoms of anxiety were the 2 most commonly discussed health-related topics on social media from June 14 to December 13, 2020. Real-time monitoring of social media posts on symptoms and medical conditions may help assess the population’s mental health status and enhance public health surveillance for infectious disease.