Brain Sciences (Mar 2023)

Using Automated Speech Processing for Repeated Measurements in a Clinical Setting of the Behavioral Variability in the Stroop Task

  • Terje B. Holmlund,
  • Alex S. Cohen,
  • Jian Cheng,
  • Peter W. Foltz,
  • Jared Bernstein,
  • Elizabeth Rosenfeld,
  • Bruno Laeng,
  • Brita Elvevåg

DOI
https://doi.org/10.3390/brainsci13030442
Journal volume & issue
Vol. 13, no. 3
p. 442

Abstract

Read online

The Stroop interference task is indispensable to current neuropsychological practice. Despite this, it is limited in its potential for repeated administration, its sensitivity and its demands on professionals and their clients. We evaluated a digital Stroop deployed using a smart device. Spoken responses were timed using automated speech recognition. Participants included adult nonpatients (N = 113; k = 5 sessions over 5 days) and patients with psychiatric diagnoses (N = 85; k = 3–4 sessions per week over 4 weeks). Traditional interference (difference in response time between color incongruent words vs. color neutral words; M = 0.121 s) and facilitation (neutral vs. color congruent words; M = 0.085 s) effects were robust and temporally stable over testing sessions (ICCs 0.50–0.86). The performance showed little relation to clinical symptoms for a two-week window for either nonpatients or patients but was related to self-reported concentration at the time of testing for both groups. Performance was also related to treatment outcomes in patients. The duration of response word utterances was longer in patients than in nonpatients. Measures of intra-individual variability showed promise for understanding clinical state and treatment outcome but were less temporally stable than measures based solely on average response time latency. This framework of remote assessment using speech processing technology enables the fine-grained longitudinal charting of cognition and verbal behavior. However, at present, there is a problematic lower limit to the absolute size of the effects that can be examined when using voice in such a brief ‘out-of-the-laboratory condition’ given the temporal resolution of the speech-to-text detection system (in this case, 10 ms). This resolution will limit the parsing of meaningful effect sizes.

Keywords