Frontiers in Communication (Sep 2019)
Variability and Central Tendencies in Speech Production
Abstract
Speech is notoriously variable, but our understanding of this variability continues to evolve. Variability has typically been taken as an indication of failure to reach a desired target due to physical or neurological limits. However, it is likely that some variability is beneficial, an effect that has been found in other domains. Part of the effort to separate beneficial from destructive variability must be to understand the distribution of values around a speech target. One aspect that is commonly measured is the standard deviation of some objective aspect of speech. The standard deviation is most meaningful for normal distributions, and the assumption in speech research has been that values are indeed normally distributed. This has not been rigorously tested, however, as the test of normality requires a large number of samples (some studies suggest a minimum of 200) to determine whether the data is normally distributed or not. Speech research (and, indeed, most research with humans) seldom reaches such numbers for a consistent environment. Here, an initial estimate for 300 repetitions of English words by a single speaker are presented. The words were pseudo-randomized with an equal number of filler items, so that immediate repetitions (and the neural and physical fatigue repetition can cause) were avoided. One hundred trials were collected on each of 3 days. Words were chosen to have very little coarticulatory influence (“heed,” “ode”/“owed”) or sizable coarticulatory influence (“geek,” “dote”). Measurements of vowel formants at acoustic midpoints indicated that the distributions were indeed normal. This was true even of the high coarticulatory environment, which some theories would predict would be skewed by the vowel's reaching the edge of an acceptable region. The current results indicate that vowel targets are consistent for different environments. Further, the range of the distributions was quite similar across the two types of environment, being, for example, about 100 Hz for F1. The amount of variability is fairly substantial but can be presumed to be beneficial, as all items were heard correctly. The normality of the distribution nonetheless indicates a control structure that accommodates the coarticulatory environment at the level of planning.
Keywords