Quantifying Depression-Related Language on Social Media During the COVID-19 Pandemic

Brent Davis; Dawn Estes McKnight; Daniela Teodorescu; Anabel Quan-Haase; Rumi Chunara; Alona Fyshe; Daniel J. Lizotte

doi:10.23889/ijpds.v5i4.1716

International Journal of Population Data Science (Mar 2022)

Quantifying Depression-Related Language on Social Media During the COVID-19 Pandemic

Brent Davis,
Dawn Estes McKnight,
Daniela Teodorescu,
Anabel Quan-Haase,
Rumi Chunara,
Alona Fyshe,
Daniel J. Lizotte

Affiliations

Brent Davis: Department of Computer Science, Western University, London, ON, Canada, N6A 3K7
Dawn Estes McKnight: Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2R3
Daniela Teodorescu: Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2R3
Anabel Quan-Haase: Department of Sociology, Western University, London, ON, Canada, N6A 3K7; Faculty of Information and Media Studies, Western University, London, ON, Canada, N6A 3K7
Rumi Chunara: Department of Computer Science & Engineering, New York University, New York, NY, 10003; Department of Biostatistics, New York University, New York, NY, 10003
Alona Fyshe: Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2R3; Department of Psychology, University of Alberta, Edmonton, AB, Canada, T6G2R3
Daniel J. Lizotte: Department of Computer Science, Western University, London, ON, Canada, N6A 3K7; Department of Epidemiology and Biostatistics, Western University, London, ON,N6A 3K7

DOI: https://doi.org/10.23889/ijpds.v5i4.1716
Journal volume & issue: Vol. 5, no. 4

Abstract

Read online

Introduction The COVID-19 pandemic had clear impacts on mental health. Social media presents an opportunity for assessing mental health at the population level. Objectives 1) Identify and describe language used on social media that is associated with discourse about depression. 2) Describe the associations between identified language and COVID-19 incidence over time across several geographies. Methods We create a word embedding based on the posts in Reddit's /r/Depression and use this word embedding to train representations of active authors. We contrast these authors against a control group and extract keywords that capture differences between the two groups. We filter these keywords for face validity and to match character limits of an information retrieval system, Elasticsearch. We retrieve all geo-tagged posts on Twitter from April 2019 to June 2021 from Seattle, Sydney, Mumbai, and Toronto. The tweets are scored with BM25 using the keywords. We call this score rDD. We compare changes in average score over time with case counts from the pandemic's beginning through June 2021. Results We observe a pattern in rDD across all cities analyzed: There is an increase in rDD near the start of the pandemic which levels off over time. However, in Mumbai we also see an increase aligned with a second wave of cases. Conclusions Our results are concordant with other studies which indicate that the impact of the pandemic on mental health was highest initially and was followed by recovery, largely unchanged by subsequent waves. However, in the Mumbai data we observed a substantial rise in rDD with a large second wave. Our results indicate possible un-captured heterogeneity across geographies, and point to a need for a better understanding of this differential impact on mental health.

Published in International Journal of Population Data Science

ISSN: 2399-4908 (Online)
Publisher: Swansea University
Country of publisher: United Kingdom
LCC subjects: Social Sciences: Economic theory. Demography: Demography. Population. Vital events
Website: https://ijpds.org

About the journal

Abstract

Keywords