Applied Sciences (Feb 2024)
From Posts to Knowledge: Annotating a Pandemic-Era Reddit Dataset to Navigate Mental Health Narratives
Abstract
Mental illness is increasingly recognized as a substantial public health challenge worldwide. With the advent of social media, these platforms have become pivotal for individuals to express their emotions, thoughts, and experiences, thereby serving as a rich resource for mental health research. This paper is devoted to the creation of a comprehensive dataset and an innovative data annotation methodology to explore the underlying causes of these mental health issues. Our approach included the extraction of over one million Reddit posts from five different subreddits, spanning the pre-pandemic, during-pandemic, and post-pandemic periods. These posts were methodically annotated using a set of specific criteria, aimed at identifying various root causes. This rigorous process produced a richly categorized dataset, invaluable for detailed analysis. The complete unlabelled dataset, along with a subset that has been expertly annotated, is prepared for public release, as outlined in the data availability section. This dataset is a critical resource for training and fine-tuning machine learning models to identify the foundational triggers of individual mental health issues, offering valuable insights for practical interventions and future research in this domain.
Keywords