Journal of Affective Disorders Reports (Apr 2024)
The prediction of self-harm behaviors in young adults with multi-modal data: an XGBoost approach
Abstract
Objectives: To enhance the ability of predicting self-harm behaviors through multidimensional data and machine learning methods, and provide a foundation for future comprehensive interventions. Methods: One hundred and twelve young adults aged 18-22 years with self-harm behaviors participated in this study as an experimental group, 98 in the control group. Eighty-three social-demographic and genetic features were collected and analyzed by an extreme gradient boosting (XGBoost) approach. Results: We found significant differences in social-demographic and genetic features between the self-harm and control groups (p<0.05). With the XGBoost algorithm, the model reached 0.866 in sensitivity and 0.734 in specificity. The balanced accuracy, positive predictive value (PPV), and negative predictive value (NPV) were 0.800, 0.789, and 0.828, respectively. The top 20 important features were, suicidal and self-injurious ideation (SSI) in the past year, rs1659400 (G/A), rs6296 (G/C), family history of mental disorders, rs11140800 (C/C), rs2770296 (T/T-C/T), rs1360780 (T/T), rs1147198 (G/T-G/G), SSI in the past month, rs4675690 (T/T), aggressive personality, rs211105 (G/G), rs1042173 (A/A), psychoticism, rs11178997 (T/T), rs1387923 (A/A), college-educated, rs7728378 (C/T-T/T), the quality of life, and rs1360780 (C/C). Limitations: The limited sample size could potentially undermine its credibility or representativeness. Moreover, the study did not take into account the duration or frequency of self-harm behaviors due to potential recall bias. Conclusions: The XGBoost is a reliable machine learning approach for analyzing multi-modal data to predict self-harm in young adults. When the social-demographic factors, personalities, and genetic features were considered, the NTRK2 gene showed great importance to self-harm in young adults.