A novel multi-model feature generation technique for suicide detection

Ting Ding; Tonghui Qu; Zongliang Zou; Cheng Ding

doi:10.7717/peerj-cs.2301

PeerJ Computer Science (Oct 2024)

A novel multi-model feature generation technique for suicide detection

Ting Ding,
Tonghui Qu,
Zongliang Zou,
Cheng Ding

Affiliations

Ting Ding: School of Earth Science, East China University of Technology, Nanchang, Jiangxi, China
Tonghui Qu: Hangzhou Hikvision Digital Technology, Hangzhou, China
Zongliang Zou: School of Earth Science, East China University of Technology, Nanchang, Jiangxi, China
Cheng Ding: Department of Biomedical Engineering, Emory University, Atlanta, GA, United States of America

DOI: https://doi.org/10.7717/peerj-cs.2301
Journal volume & issue: Vol. 10
p. e2301

Abstract

Read online Read online

Automated expert systems (AES) analyzing depression-related content on social media have piqued the interest of researchers. Depression, often linked to suicide, requires early prediction for potential life-saving interventions. In the conventional approach, psychologists conduct patient interviews or administer questionnaires to assess depression levels. However, this traditional method is plagued by limitations. Patients might not feel comfortable disclosing their true feelings to psychologists, and counselors may struggle to accurately predict situations due to limited data. In this context, social media emerges as a potentially valuable resource. Given the widespread use of social media in daily life, individuals often express their nature and mental state through their online posts. AES can efficiently analyze vast amounts of social media content to predict depression levels in individuals at an early stage. This study contributes to this endeavor by proposing an innovative approach for predicting suicide risks using social media content and machine learning techniques. A novel multi-model feature generation technique is employed to enhance the performance of machine learning models. This technique involves the use of a feature extraction method known as term frequency-inverse document frequency (TF-IDF), combined with two machine learning models: logistic regression (LR) and support vector machine (SVM). The proposed technique calculates probabilities for each sample in the dataset, resulting in a new feature set referred to as the probability-based feature set (ProBFS). This ProBFS is compact yet highly correlated with the target classes in the dataset. The utilization of concise and correlated features yields significant outcomes. The SVM model achieves an impressive accuracy score of 0.96 using ProBFS while maintaining a low computational time of 5.63 seconds even when dealing with extensive datasets. Furthermore, a comparison with state-of-the-art approaches is conducted to demonstrate the significance of the proposed method.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords