Optimizing the predictive power of depression screenings using machine learning

Yannik Terhorst; Lasse B Sander; David D Ebert; Harald Baumeister

doi:10.1177/20552076231194939

Digital Health (Aug 2023)

Optimizing the predictive power of depression screenings using machine learning

Yannik Terhorst,
Lasse B Sander,
David D Ebert,
Harald Baumeister

Affiliations

Yannik Terhorst: Department of Clinical Psychology and Psychotherapy, Institute of Psychology and Education, , Ulm, Germany
Lasse B Sander: Medical Psychology and Medical Sociology, Faculty of Medicine, , Freiburg, Germany
David D Ebert: Department for Sport and Health Sciences, Chair for Psychology & Digital Mental Health Care, , Munich, Germany
Harald Baumeister: Department of Clinical Psychology and Psychotherapy, Institute of Psychology and Education, , Ulm, Germany

DOI: https://doi.org/10.1177/20552076231194939
Journal volume & issue: Vol. 9

Abstract

Read online

Objective Mental health self-report and clinician-rating scales with diagnoses defined by sum-score cut-offs are often used for depression screening. This study investigates whether machine learning (ML) can detect major depressive episodes (MDE) based on screening scales with higher accuracy than best-practice clinical sum-score approaches. Methods Primary data was obtained from two RCTs on the treatment of depression. Ground truth were DSM 5 MDE diagnoses based on structured clinical interviews (SCID) and PHQ-9 self-report, clinician-rated QIDS-16, and HAM-D-17 were predictors. ML models were trained using 10-fold cross-validation. Performance was compared against best-practice sum-score cut-offs. Primary outcome was the Area Under the Curve (AUC) of the Receiver Operating Characteristic curve. DeLong's test with bootstrapping was used to test for differences in AUC. Secondary outcomes were balanced accuracy, precision, recall, F1-score, and number needed to diagnose (NND). Results A total of k = 1030 diagnoses (no diagnosis: k = 775; MDE: k = 255) were included. ML models achieved an AUC QIDS-16 = 0.94, AUC HAM-D-17 = 0.88, and AUC PHQ-9 = 0.83 in the testing set. ML AUC was significantly higher than sum-score cut-offs for QIDS-16 and PHQ-9 ( ps ≤ 0.01; HAM_D-17: p = 0.847). Applying optimal prediction thresholds, QIDS-16 classifier achieved clinically relevant improvements (Δbalanced accuracy = 8%, ΔF1-score = 14%, ΔNND = 21%). Differences for PHQ_9 and HAM-D-17 were marginal. Conclusions ML augmented depression screenings could potentially make a major contribution to improving MDE diagnosis depending on questionnaire (e.g., QIDS-16). Confirmatory studies are needed before ML enhanced screening can be implemented into routine care practice.

Published in Digital Health

ISSN: 2055-2076 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://journals.sagepub.com/home/dhj

About the journal