PLoS ONE (Jan 2020)

Integrating questionnaire measures for transdiagnostic psychiatric phenotyping using word2vec.

  • Aaron Sonabend W,
  • Amelia M Pellegrini,
  • Stephanie Chan,
  • Hannah E Brown,
  • James N Rosenquist,
  • Pieter J Vuijk,
  • Alysa E Doyle,
  • Roy H Perlis,
  • Tianxi Cai

DOI
https://doi.org/10.1371/journal.pone.0230663
Journal volume & issue
Vol. 15, no. 4
p. e0230663

Abstract

Read online

BACKGROUND:Recent initiatives in psychiatry emphasize the utility of characterizing psychiatric symptoms in a multidimensional manner. However, strategies for applying standard self-report scales for multiaxial assessment have not been well-studied, particularly where the aim is to support both categorical and dimensional phenotypes. METHODS:We propose a method for applying natural language processing to derive dimensional measures of psychiatric symptoms from questionnaire data. We utilized nine self-report symptom measures drawn from a large cellular biobanking study that enrolled individuals with mood and psychotic disorders, as well as healthy controls. To summarize questionnaire results we used word embeddings, a technique to represent words as numeric vectors preserving semantic and syntactic meaning. A low-dimensional approximation to the embedding space was used to derive the proposed succinct summary of symptom profiles. To validate our embedding-based disease profiles, these were compared to presence or absence of axis I diagnoses derived from structured clinical interview, and to objective neurocognitive testing. RESULTS:Unsupervised and supervised classification to distinguish presence/absence of axis I disorders using survey-level embeddings remained discriminative, with area under the receiver operating characteristic curve up to 0.85, 95% confidence interval (CI) (0.74,0.91) using Gaussian mixture modeling, and cross-validated area under the receiver operating characteristic curve 0.91, 95% CI (0.88,0.94) using logistic regression. Derived symptom measures and estimated Research Domain Criteria scores also associated significantly with performance on neurocognitive tests. CONCLUSIONS:Our results support the potential utility of deriving dimensional phenotypic measures in psychiatric illness through the use of word embeddings, while illustrating the challenges in identifying truly orthogonal dimensions.