Scientific Reports (May 2024)
Interdisciplinary approach to identify language markers for post-traumatic stress disorder using machine learning and deep learning
Abstract
Abstract Post-traumatic stress disorder (PTSD) lacks clear biomarkers in clinical practice. Language as a potential diagnostic biomarker for PTSD is investigated in this study. We analyze an original cohort of 148 individuals exposed to the November 13, 2015, terrorist attacks in Paris. The interviews, conducted 5–11 months after the event, include individuals from similar socioeconomic backgrounds exposed to the same incident, responding to identical questions and using uniform PTSD measures. Using this dataset to collect nuanced insights that might be clinically relevant, we propose a three-step interdisciplinary methodology that integrates expertise from psychiatry, linguistics, and the Natural Language Processing (NLP) community to examine the relationship between language and PTSD. The first step assesses a clinical psychiatrist's ability to diagnose PTSD using interview transcription alone. The second step uses statistical analysis and machine learning models to create language features based on psycholinguistic hypotheses and evaluate their predictive strength. The third step is the application of a hypothesis-free deep learning approach to the classification of PTSD in our cohort. Results show that the clinical psychiatrist achieved a diagnosis of PTSD with an AUC of 0.72. This is comparable to a gold standard questionnaire (Area Under Curve (AUC) ≈ 0.80). The machine learning model achieved a diagnostic AUC of 0.69. The deep learning approach achieved an AUC of 0.64. An examination of model error informs our discussion. Importantly, the study controls for confounding factors, establishes associations between language and DSM-5 subsymptoms, and integrates automated methods with qualitative analysis. This study provides a direct and methodologically robust description of the relationship between PTSD and language. Our work lays the groundwork for advancing early and accurate diagnosis and using linguistic markers to assess the effectiveness of pharmacological treatments and psychotherapies.