Detecting Suicidal Ideations in Online Forums with Textual and Psycholinguistic Features

Eldar Yeskuatov; Sook-Ling Chua; Lee Kien Foo

doi:10.3390/app14219911

Applied Sciences (Oct 2024)

Detecting Suicidal Ideations in Online Forums with Textual and Psycholinguistic Features

Eldar Yeskuatov,
Sook-Ling Chua,
Lee Kien Foo

Affiliations

Eldar Yeskuatov: Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, Cyberjaya 63100, Malaysia
Sook-Ling Chua: Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, Cyberjaya 63100, Malaysia
Lee Kien Foo: Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, Cyberjaya 63100, Malaysia

DOI: https://doi.org/10.3390/app14219911
Journal volume & issue: Vol. 14, no. 21
p. 9911

Abstract

Read online

Suicide is a global public health problem that takes hundreds of thousands of lives each year. The key to effective suicide prevention is early detection of suicidal ideations and timely intervention. However, several factors hinder traditional suicide risk screening methods. Primarily, the social stigma associated with suicide presents a challenge to suicidal ideation detection, as existing methods require patients to explicitly communicate their suicidal propensities. In contrast, progressively more at-risk people choose online platforms—such as Reddit—as their preferred avenues for sharing their suicidal experiences and seeking emotional support. As a result, these online platforms have become an unobtrusive source of user-generated textual data that can be used to detect suicidality with supervised machine learning and natural language processing techniques. In this paper, we proposed a suicidal ideation detection approach that combines textual and psycholinguistic features extracted from the Reddit forum. Subsequently, we selected the most informative features using the Boruta algorithm and employed four classifiers: logistic regression, naïve Bayes, support vector machines, and random forest. The naïve Bayes models trained with the combination of term frequency-inverse document frequency (TF-IDF) and National Research Council (NRC) features demonstrated the highest performance, obtaining a F1 score of 70.99%. Our experimental results illustrate that a combination of textual and psycholinguistic features yields better classification performance compared to using those features separately.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords