Analysis of depression in social media texts through the Patient Health Questionnaire-9 and natural language processing

Nam Hyeok Kim; Ji Min Kim; Da Mi Park; Su Ryeon Ji; Jong Woo Kim

doi:10.1177/20552076221114204

Digital Health (Jul 2022)

Analysis of depression in social media texts through the Patient Health Questionnaire-9 and natural language processing

Nam Hyeok Kim,
Ji Min Kim,
Da Mi Park,
Su Ryeon Ji,
Jong Woo Kim

Affiliations

Nam Hyeok Kim: Department of Mathematics, , Seoul, Republic of Korea
Ji Min Kim: Business Administration, , Seoul, Republic of Korea
Da Mi Park: Business Administration, , Seoul, Republic of Korea
Su Ryeon Ji: Department of Mathematics, , Seoul, Republic of Korea
Jong Woo Kim: School of Business, , Seoul, Republic of Korea

DOI: https://doi.org/10.1177/20552076221114204
Journal volume & issue: Vol. 8

Abstract

Read online

Objective Although depression in modern people is emerging as a major social problem, it shows a low rate of use of mental health services. The purpose of this study was to classify sentences written by social media users based on the nine symptoms of depression in the Patient Health Questionnaire-9, using natural language processing to assess naturally users’ depression based on their results. Methods First, train two sentence classifiers: the Y/N sentence classifier, which categorizes whether a user’s sentence is related to depression, and the 0–9 sentence classifier, which further categorizes the user sentence based on the depression symptomology of the Patient Health Questionnaire-9. Then the depression classifier, which is a logistic regression model, was generated to classify the sentence writer’s depression. These trained sentence classifiers and the depression classifier were used to analyze the social media textual data of users and establish their depression. Results Our experimental results showed that the proposed depression classifier showed 68.3% average accuracy, which was better than the baseline depression classifier that used only the Y/N sentence classifier and had 53.3% average accuracy. Conclusions This study is significant in that it demonstrates the possibility of determining depression from only social media users’ textual data.

Published in Digital Health

ISSN: 2055-2076 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://journals.sagepub.com/home/dhj

About the journal