Unsupervised Machine Learning to Identify Depressive Subtypes

Benson Kung; Maurice Chiang; Gayan Perera; Megan Pritchard; Robert Stewart

doi:10.4258/hir.2022.28.3.256

Healthcare Informatics Research (Jul 2022)

Unsupervised Machine Learning to Identify Depressive Subtypes

Benson Kung,
Maurice Chiang,
Gayan Perera,
Megan Pritchard,
Robert Stewart

Affiliations

Benson Kung: Carbon Health, San Mateo, CA, USA
Maurice Chiang: Carbon Health, San Mateo, CA, USA
Gayan Perera: Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
Megan Pritchard: Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
Robert Stewart: Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK

DOI: https://doi.org/10.4258/hir.2022.28.3.256
Journal volume & issue: Vol. 28, no. 3
pp. 256 – 266

Abstract

Read online

Objectives This study evaluated an unsupervised machine learning method, latent Dirichlet allocation (LDA), as a method for identifying subtypes of depression within symptom data. Methods Data from 18,314 depressed patients were used to create LDA models. The outcomes included future emergency presentations, crisis events, and behavioral problems. One model was chosen for further analysis based upon its potential as a clinically meaningful construct. The associations between patient groups created with the final LDA model and outcomes were tested. These steps were repeated with a commonly-used latent variable model to provide additional context to the LDA results. Results Five subtypes were identified using the final LDA model. Prior to the outcome analysis, the subtypes were labeled based upon the symptom distributions they produced: psychotic, severe, mild, agitated, and anergic-apathetic. The patient groups largely aligned with the outcome data. For example, the psychotic and severe subgroups were more likely to have emergency presentations (odds ratio [OR] = 1.29; 95% confidence interval [CI], 1.17–1.43 and OR = 1.16; 95% CI, 1.05–1.29, respectively), whereas these outcomes were less likely in the mild subgroup (OR = 0.86; 95% CI, 0.78–0.94). We found that the LDA subtypes were characterized by clusters of unique symptoms. This contrasted with the latent variable model subtypes, which were largely stratified by severity. Conclusions This study suggests that LDA can surface clinically meaningful, qualitative subtypes. Future work could be incorporated into studies concerning the biological bases of depression, thereby contributing to the development of new psychiatric therapeutics.

Published in Healthcare Informatics Research

ISSN: 2093-3681 (Print); 2093-369X (Online)
Publisher: The Korean Society of Medical Informatics
Country of publisher: Korea, Republic of
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://www.e-hir.org

About the journal

Abstract

Keywords