Digital Health (Jul 2022)
Analysis of depression in social media texts through the Patient Health Questionnaire-9 and natural language processing
Abstract
Objective Although depression in modern people is emerging as a major social problem, it shows a low rate of use of mental health services. The purpose of this study was to classify sentences written by social media users based on the nine symptoms of depression in the Patient Health Questionnaire-9, using natural language processing to assess naturally users’ depression based on their results. Methods First, train two sentence classifiers: the Y/N sentence classifier, which categorizes whether a user’s sentence is related to depression, and the 0–9 sentence classifier, which further categorizes the user sentence based on the depression symptomology of the Patient Health Questionnaire-9. Then the depression classifier, which is a logistic regression model, was generated to classify the sentence writer’s depression. These trained sentence classifiers and the depression classifier were used to analyze the social media textual data of users and establish their depression. Results Our experimental results showed that the proposed depression classifier showed 68.3% average accuracy, which was better than the baseline depression classifier that used only the Y/N sentence classifier and had 53.3% average accuracy. Conclusions This study is significant in that it demonstrates the possibility of determining depression from only social media users’ textual data.