Efficacy and Mechanism Evaluation (Sep 2023)
The impact of the Speech Systems Approach on intelligibility for children with cerebral palsy: a secondary analysis
Abstract
Background The motor speech disorder, dysarthria, is common in cerebral palsy. The Speech Systems Approach therapy programme, which focuses on controlling breath supply and speech rate, has increased children’s intelligibility. Objective To ascertain if increased intelligibility is due to better differentiation of the articulation of individual consonants in words spoken in isolation and in connected speech. Design Secondary analysis. Setting University. Participants Forty-two children with cerebral palsy and dysarthria aged 5–18 years, Gross Motor Function Classification System I–V. Intervention The Speech Systems Approach is a motor learning therapy delivered to individuals by a speech and language therapist in 40-minute sessions, three times per week for 6 weeks. Intervention focuses on production of a strong, clear voice and speaking at a steady rate. Practice changes from single words to increasingly longer utterances in tasks with increasing cognitive load. Main outcome measures Unfamiliar listeners’ identification of singleton consonants (e.g. nap) and clusters of consonants (e.g. stair, end) at the start and end of words when hearing single words in forced choice tasks and connected speech in free transcription tasks. Acoustic measures of sound intensity and duration. Data sources Data collected at 1-week pre- and 1-week post-therapy from three studies: two interrupted time series design, one feasibility randomised controlled trial. Results Word initial and word final singleton consonants and consonant clusters were better identified post-therapy. The extent of improvement differed across word initial and word final singleton consonant subtypes. Improvement was greater for single words than connected speech. Change in sound identification varied across children, particularly in connected speech. Sound intensity and duration increases also were inconsistent. Limitations The small sample size did not allow for analysis of cerebral palsy type. Acoustic data were not available for all children, limiting the strength of conclusions that can be drawn. The different but phonetically balanced word lists, used in the original research, created variability in single words spoken across recordings analysed. Low frequencies of plosives, fricatives and affricates necessitated their combination for analysis preventing investigation of the effect of specific consonants. Connected speech was spontaneous, again creating variability within the data analysed. The estimated effects of therapy may therefore be partially explained by differences in the spoken language elicited. Conclusions The Speech Systems Approach helped children generate greater breath supply and a steady rate, leading to increased intensity and duration of consonant sounds in single words, thereby aiding their identification by listeners. Transfer of the motor behaviour to connected speech was inconsistent. Future work Refining the Speech Systems Approach to focus on connected speech early in the intervention. Personalisation of cues according to perceptual and acoustic speech measures. Creation of a battery of measures that can be repeated across children and multiple recordings. Study registration This trial is registered as Research Registry 6117. Funding This project was funded by the National Institute for Health and Care Research (NIHR) Efficacy and Mechanism Evaluation programme (NIHR130967) and will be published in full in Efficacy and Mechanism Evaluation; Vol. 10, No. 4. See the NIHR Journals Library website for further project information. Plain language summary Some children with cerebral palsy have speech that sounds weak, slurred and difficult to understand, which seriously impacts their social life and education. We developed a therapy programme to help children control their breathing and how fast they speak. Having more breath should make children’s voices stronger. Speaking at a steady rate should give enough time for children to move their jaw, tongue and lips to produce each sound more precisely. Children’s speech was easier to understand after the therapy. This study aimed to find out if the therapy worked by helping children to say consonant sounds more clearly. We used recordings made in previous research to work out which consonants listeners heard correctly. We also looked at waveforms, which showed children’s speech as moving pictures, to find out how speech changed. After therapy, when children spoke in single words, listeners heard almost all types of consonant sounds at the start and end of words more clearly. No particular type of consonant sounds, such as ‘s’ in ‘so’ or ‘t’ in ‘tar’, led to better speech clarity. Waveforms showed that some children produced stronger speech sounds, some slowed their speech, and some did both. Listeners heard some children more clearly after therapy when they spoke in phrases, but found others more difficult to understand. Few consonants were easier to understand after therapy. We saw no clear patterns of change in speech waveforms. Overall, children produced stronger, more precise speech in single words, but not all transferred this skill to speaking in phrases. Children differed in how they achieved clearer speech. We used the findings to refine the therapy to focus on phrases early in the programme and to personalise instructions to children’s individual speech patterns. We will use waveforms to find where children have most difficulty and to measure improvement. Scientific summary Background and introduction The motor disorders of cerebral palsy (CP) often affect breath control and speech production, causing the speech disorder dysarthria. Dysarthria in CP typically affects all speech systems: respiration, phonation, resonance, prosody and articulation. Respiration is often shallow and lacks co-ordination with phonation, generating weak or inconsistent subglottal pressure. Vocal folds may vibrate slowly and irregularly; air may leak through the folds when they should be adducted, reducing the intraoral pressure and weakening the sound source. The velum may rise slowly or fail to close off the nasal passage during speech. The movements of the articulators (jaw, tongue and lips) may be slow and imprecise. They may also be weak, reducing children’s ability to constrict the vocal tract for consonant sounds. The combined effect of these limitations is that children often speak in short phrases, with inappropriate phrasing or rushes of speech if children run out of air. Their voice may sound weak, breathy and sometimes harsh. Speech is often slow with reduced melodic intonation and children may have a restricted range of consonants that they can produce clearly. Intervention focussing on breath support and speech rate is expected to aid with the co-ordination of three phases of speech production, mainly initiation, phonation and articulation. Greater breath supply and increased air pressure during exhalation should increase subglottal air pressure bringing firmer contact of the vocal folds during phonation to generate a stronger vocal note/sound source. The improved audibility and potential for greater intraoral air pressure arising from this will also help compensate for any weak closures of articulators and reduce ‘leakage’ of air during speech. A steady speech rate should allow children to move with precision from one articulatory place and manner to another. Thus, as a result of changes in breath supply and rate, phonemes should be acoustically differentiated and listeners better able to perceive the sounds that children are articulating (increased phonetic intelligibility). The Speech Systems Approach has been developed to focus on breath control and speech rate and has led to improvements in the intelligibility of children’s speech which have been maintained for up to 12 weeks without further intervention. What is not yet known is whether there is a differential effect on types of speech sounds, or whether this effect is moderated by CP type or severity of impairment. Establishing this can then help with further individualisation of therapy and further gains in intelligibility. Aims The aim of the study was to ascertain if therapy focusing on breath supply and speech rate is associated with increased differentiation of the articulation of individual phonemes, enabling listeners to better identify individual phonemes in words spoken in isolation and in connected speech (CS). Methods Design The study was a secondary analysis of previously collected data from two phase II studies using an interrupted time series design and one feasibility randomised controlled trial of the Speech Systems Approach. Participants Forty-two children and young people aged 5–18 years, who had a diagnosis of CP by a medical practitioner and moderate to severe dysarthria as assessed by their local speech language therapist, received the Speech Systems Approach in the original studies. Participants were excluded from the studies if they had hearing impairments >50 decibel HL, visual impairments that were not correctable with glasses, or were unable to follow simple verbal instructions. All 42 participants were included in this secondary analysis. Intervention Children received individual therapy following the Speech Systems Approach from a registered speech and language therapist three times a week for 6 weeks. Sessions lasted approximately 40 minutes. In two of the original studies the sessions were provided face-to-face, the third study session took place remotely using video conferencing software. In the first session the therapist tried several cues to find the best that elicited a strong, clear voice in an open vowel (ah). Cues included ‘strong, big, loud’ and a combination of these. For children who had difficulty initiating movement cues were ‘nice and easy’ or ‘smooth’. Once the most appropriate cue had been found for the child, that cue was used to elicit open vowels on command. Practice of the target voice then followed a hierarchy of increasing length of utterance and cognitive load. Children first practiced in single words (SWs), then short phrases and finally longer CS. At each stage they started with repetition, moved to picture naming and description, and then questions and answers and games. Children had to use their target voice 8/10 attempts to move to the next level in the hierarchy. Children were provided with knowledge of results on how their voice sounded and were encouraged to use bio feedback (‘How did that feel?’). Feedback was given frequently in the acquisition phase at each level of the hierarchy and then faded to aid retention. Sessions followed a set sequence for practice of the target voice: (1) 10 open vowels; (2) three repetitions of 10 self-selected phrases that children use in daily life; (3) 70–80 words and phrases from the speech task hierarchy; and (4) 10–20 utterances randomly selected from the three preceding tasks. Procedure Data were collected at 6- and 1-week pre-therapy and 1, 6- and 12-weeks post-therapy. The original studies suggested that gains in intelligibility were observed immediately after therapy (1 week) and were maintained at 6 and 12 weeks. For this study we analysed data from 1-week pre- and 1-week post-therapy. At each time point, participants’ speech was recorded on two separate days and elicited through two tasks. SWs were elicited using the Children’s Speech Intelligibility Measure (CSIM) which contains 200 lists of 50 words. The CSIM is a forced choice word recognition task. Listeners heard each word and selected the target word from a list of 12 phonetically similar words. CS was elicited by asking the participants to describe complex pictures and answer questions. The recordings were transcribed live by an expert speech and language therapist and then checked with the child, to create a gold standard transcription of target words. Up to 60 seconds of CS was presented to listeners in phrases separated by pauses of at least three seconds. The CS recognition task was open choice; listeners heard the recordings of CS and wrote down the words they perceived the child to say. Listeners were native speakers of English, aged 18–55 years, and had no hearing difficulties or regular experience of interacting with people with disabilities or speech disorders. In each of the original studies, listeners were randomly allocated three speech recordings, with the limitation that they did not hear the same child more than once. Each recording was heard by three listeners. Outcomes Perceptual data Each word produced by children in the SW and CS tasks was categorised by the single consonants and the clusters of consonants at the start and end of words. Consonants were categorised according to their voicing (voiced = vocal fold vibration; voiceless = vocal folds open), place of articulation (bilabial, labiodental, dental, alveolar, post-alveolar, velar, and glottal), and manner of articulation (plosives, fricatives, affricates, nasals and approximants). The words perceived by listeners were categorised in the same way. A review of the data showed that some manners and places of articulation rarely appeared in the SW lists. We therefore combined manners of articulation into obstruents, which demand constriction/closure of the vocal tract (plosives, fricative and affricates), and sonorants, which allow air to flow out of the mouth or nose (approximants and nasals) and places of articulation to labial (bilabial, labiodental), coronal (dental, alveolar, post-alveolar) and dorsal (velar). Consonants were then further categorised according to a combination of their voicing, place, and manner characteristics (e.g. voiced labial obstruent). This was done owing to dependencies between voice and manner such that consonants with a sonorant manner are never voiceless. Because of this, we were not able to separate the main effects of voice and manner from each other if they were added as individual explanatory variables in modelling. The perceptual data were multilevel such that children were level 2 units and each target word-and-listener combination was a level 1 unit. The primary outcome for the perceptual analysis was the identification of words and segments within them (binary outcomes). Words and sounds were identified if there was a match between the target and perceived word/consonant/cluster/voice/place/manner. Secondary outcomes include the percentage identification measures. We visually inspected radar plots profiling performance on six percentage identification measures comprising the voice, place and manner of word initial and word final singleton consonants at pre- and post-therapy. Acoustic data Recordings of 24 out of the 42 children were available pre- and post-therapy and were automatically transcribed and segmented, and manually corrected. Measures of intensity (dB), duration (ms) and speech rate (seconds per syllable) were made at word and segment level. Changes in intensity and/or speech rate were monitored post-therapy and examined as a function of consonant manner, intelligibility, and groups of children based on the perceptual analyses. Data analysis Perceptual data We used generalised linear mixed modelling (GLMM) with a logit link function and random effect of child to examine the effect of the Speech Systems Approach intervention (post-therapy vs. pre-therapy) on identification of word initial and word final singleton consonants and consonant clusters and whether there was evidence that certain types of consonants benefitted more from the therapy than others. We visually inspected radar plots of change in listeners’ identification of consonants to group children with similar profiles. Acoustic data Acoustic measurements made on SWs and CS, with mean differences and 95% confidence intervals (CIs) calculated between the post- and pre-therapy. Changes in the acoustic variables were then compared to changes in perceptual variables such as single and connected word intelligibility. These comparisons were made for each individual child and groupings derived from the radar plots from the perceptual data analysis. Results Perceptual analysis Using GLMM, we found evidence that the odds of word initial and word final singleton consonants being identified by listeners increased after therapy in both the SW [odds ratio (95% CI) = 1.54 (1.44 to 1.65) and 1.61 (1.51 to 1.73) respectively; all p < 0.01] and CS analyses [1.26 (1.15 to 1.39) and 1.27 (1.15 to 1.41) respectively; all p < 0.01], even after adjusting for potential confounders. There was also evidence for the effect of therapy on the identification of word initial consonant clusters and word final consonant clusters in the SW analyses [1.84 (1.60 to 2.12) and 1.42 (1.15 to 1.75) respectively; all p < 0.01]. Identification of consonant clusters was not examined in the CS data due to the infrequent use of clusters. Additionally, we found evidence of heterogeneity in the effect of therapy on word initial and word final singleton consonant identification between the subtypes of consonants, categorised according to their voicing, place, and manner, in both SW and CS analyses. Nearly all subtypes of consonants, as either word initial or word final singleton consonants, showed an improvement in the probability of being identified by a listener after therapy in the SW analyses, but only about a handful showed an improvement in the CS analyses. We identified six groups of children in the SW data and seven groups of children in the CS data. Each group categorised children according to their relative standing at pre-therapy, the direction and magnitude of change in percentage identification measures, and additionally which combination of the six measures displayed a change. Acoustic analysis Most of the children whose data were analysed acoustically produced slower speech post-therapy, regardless of gains in intelligibility. There was individual variability in the degree of change in speech rate, with a tendency for greater decreases in the speech rate of children whose initial performance was low. Most children also produced words with higher mean intensity post-therapy regardless of gains in intelligibility. However, the majority of the children with an increase in maximum intensity in initial and final word position had improved intelligibility post-therapy. This was particularly the case for the obstruent category. There was variation in changes to intensity across clusters, word position and manner categories, with more modest increases in the intensity of sonorant sounds pre- and post-therapy compared to obstruents. There were also more variable changes in speech rate, intensity and their relationship with intelligibility in CS compared with SWs. Conclusions and recommendations The Speech Systems Approach, which focusses on breath control and speech rate, improved the intelligibility of SWs. Our previous research suggests that gains were maintained for up to 12 weeks following the intensive burst of therapy. This type of intervention is now recommended by National Institute for Health and Care Excellence (NICE) to help address the interaction challenges faced by children with CP. Acoustic analysis of a subset of the data shows change for individuals in intensity and/or duration of individual speech segments. Findings suggest that increased intelligibility was achieved through a stronger vocal signal and allowing children to articulate individual sounds with greater precision. Changes in CS were more modest, suggesting that adaptations to the approach are warranted. The marked individual differences suggest a varying response to therapy between children but also within children across speech segments. Personalisation of the intervention, with cues adapted to the child’s performance, should be investigated. Acoustic analysis of speech during the intervention could aid personalisation, showing how speech changes in response to individual cues. Practice should move to CS quickly. Intelligibility, through listener identification of words and constituent sounds, should continue to be assessed in SWs and CS to investigate the impact of utterance length. Two matched lists of high frequency SWs could be used to facilitate comparison across children and time. Using one list at 6-weeks pre-therapy and immediately post-therapy, and a second list immediately pre-therapy and at follow-up would minimise learning effects. Acoustic change should be measured to help understand how change is achieved by individuals. Study registration This trial is registered as Research Registry 6117. Funding This project was funded by the National Institute for Health and Care Research (NIHR) Efficacy and Mechanism Evaluation programme (NIHR130967) and will be published in full in Efficacy and Mechanism Evaluation; Vol. 10, No. 4. See the NIHR Journals Library website for further project information.
Keywords