Machine Learning Driven Mental Stress Detection on Reddit Posts Using Natural Language Processing

Shaunak Inamdar; Rishikesh Chapekar; Shilpa Gite; Biswajeet Pradhan

doi:10.1007/s44230-023-00020-8

Human-Centric Intelligent Systems (Mar 2023)

Machine Learning Driven Mental Stress Detection on Reddit Posts Using Natural Language Processing

Shaunak Inamdar,
Rishikesh Chapekar,
Shilpa Gite,
Biswajeet Pradhan

Affiliations

Shaunak Inamdar: AIML Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University)
Rishikesh Chapekar: AIML Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University)
Shilpa Gite: AIML Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University)
Biswajeet Pradhan: Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), School of Civil and Environmental Engineering, University of Technology Sydney

DOI: https://doi.org/10.1007/s44230-023-00020-8
Journal volume & issue: Vol. 3, no. 2
pp. 80 – 91

Abstract

Read online

Abstract People’s mental conditions are often reflected in their social media activity due to the internet's anonymity. Psychiatric issues are often detected through such activities and can be addressed in their early stages, potentially preventing the consequences of unattended mental disorders like depression and anxiety. In this paper, the authors have implemented machine learning models and used various embedding techniques to classify posts from the famous social media blog site Reddit as stressful and non-stressful. The dataset used contains user posts that can be analyzed to detect patterns in the social media activity of those diagnosed with mental disorders. This paper uses different NLP (Natural Language Processing) tools such as ELMo (Embeddings from Language Models) word embeddings, BERT (Bidirectional Encoder Representations from Transformers) tokenizers, and BoW (Bag of Words) approach to create word/sentence data that can be fed to machine learning models. The results of each method have been discussed. The results achieved a top F1 score of 0.76, a Precision score of 0.71, and a Recall of 0.74 using only the preprocessed texts and machine learning algorithms to classify the posts. The results achieved by this paper are significant and have the potential to be applied in real-world scenarios to analyze mental stress among social media users. Although this paper focuses on data from Reddit, the techniques used can be transferred to similar social media platforms and could help solve the growing mental health crisis.

Published in Human-Centric Intelligent Systems

ISSN: 2667-1336 (Online)
Publisher: Springer Nature
Country of publisher: Netherlands
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44230

About the journal

Abstract

Keywords