Journal of Medical Internet Research (Aug 2024)
Identifying Reddit Users at a High Risk of Suicide and Their Linguistic Features During the COVID-19 Pandemic: Growth-Based Trajectory Model
Abstract
BackgroundSuicide has emerged as a critical public health concern during the COVID-19 pandemic. With social distancing measures in place, social media has become a significant platform for individuals expressing suicidal thoughts and behaviors. However, existing studies on suicide using social media data often overlook the diversity among users and the temporal dynamics of suicide risk. ObjectiveBy examining the variations in post volume trajectories among users on the r/SuicideWatch subreddit during the COVID-19 pandemic, this study aims to investigate the heterogeneous patterns of change in suicide risk to help identify social media users at high risk of suicide. We also characterized their linguistic features before and during the pandemic. MethodsWe collected and analyzed post data every 6 months from March 2019 to August 2022 for users on the r/SuicideWatch subreddit (N=6163). A growth-based trajectory model was then used to investigate the trajectories of post volume to identify patterns of change in suicide risk during the pandemic. Trends in linguistic features within posts were also charted and compared, and linguistic markers were identified across the trajectory groups using regression analysis. ResultsWe identified 2 distinct trajectories of post volume among r/SuicideWatch subreddit users. A small proportion of users (744/6163, 12.07%) was labeled as having a high risk of suicide, showing a sharp and lasting increase in post volume during the pandemic. By contrast, most users (5419/6163, 87.93%) were categorized as being at low risk of suicide, with a consistently low and mild increase in post volume during the pandemic. In terms of the frequency of most linguistic features, both groups showed increases at the initial stage of the pandemic. Subsequently, the rising trend continued in the high-risk group before declining, while the low-risk group showed an immediate decrease. One year after the pandemic outbreak, the 2 groups exhibited differences in their use of words related to the categories of personal pronouns; affective, social, cognitive, and biological processes; drives; relativity; time orientations; and personal concerns. In particular, the high-risk group was discriminant in using words related to anger (odds ratio [OR] 3.23, P<.001), sadness (OR 3.23, P<.001), health (OR 2.56, P=.005), achievement (OR 1.67, P=.049), motion (OR 4.17, P<.001), future focus (OR 2.86, P<.001), and death (OR 4.35, P<.001) during this stage. ConclusionsBased on the 2 identified trajectories of post volume during the pandemic, this study divided users on the r/SuicideWatch subreddit into suicide high- and low-risk groups. Our findings indicated heterogeneous patterns of change in suicide risk in response to the pandemic. The high-risk group also demonstrated distinct linguistic features. We recommend conducting real-time surveillance of suicide risk using social media data during future public health crises to provide timely support to individuals at potentially high risk of suicide.