نشریه پژوهشهای زبانشناسی (Sep 2020)
Acoustic Correlations of Speech Rhythms in Persian Based on Variability of Between-speakers Characteristics
Abstract
AbstractThe durational variability of phonetic intervals is considered as one of the properties of speech rhythm. These intervals include segmental, vowel, consonantal, vocalic, intervocalic, voiced, unvoiced, syllable, and syllable peak intervals. The durational variability measure for some of these intervals, such as vowel, consonantal, vocalic, intervocalic intervals, determines the classification of languages based on their rhythm. Besides, in some cases, the speaker identification is only possible through the person's voice. The segmental and suprasegmental properties of a language can be used to identify the speaker. In this study, the acoustic correlations of Persian speech rhythm in a reading text are calculated by various durational measures. Also, the between-speaker rhythmic variability is considered to find the best rhythmic measures for Persian speaker identification. The results confirm that Persian is near to the syllable-based languages. Moreover, the results from segmental and supra-segmental consideration demonstrate significant between-speaker variability in Persian. Among phonetic intervals, nPVI-VC and V% (percentage of vocalic intervals) best discriminate between-speaker variability in Persian.Keywords: Speech rhythm, Durational variability, Acoustic correlations, Between-speaker variability, Rhythmic measures IntroductionThe rhythmic properties of languages have been one of the controversial issues in linguistics in recent studies. Early studies on the classification of different rhythm types in language focused on the syllable and foot durations in which the speech rhythm was defined in terms of isochrony (Abercrombie, 1967; Lloyd James, 1940; Pike, 1945). They believed that Germanic languages had a simultaneous foot; that is why they were called "stress-timed" languages. It was also believed that Romance languages had similar syllables, so they were called "syllable-timed" languages.However, such approaches can be easily violated in spontaneous speech (Dauer, 1987). Dauer argued that languages with different rhythms also differ in syllable weight and vowel reduction. Stress-timed languages usually have a complex syllable structure and a higher rate of vowel reduction. Ramus, Nesper, and Mahler (1999) examined this hypothesis by measuring the standard deviation of vocalic (∆V) and consonantal intervals (∆C) as well as the percentage of vocalic intervals (%V) for each sentence. Then Grabe and Low (2002) introduced the pairwise variability index (PVI) to measure durational variability between sequences of vocalic and consonantal intervals (nPVI-V and rPVI-C). Besides, Dellwo (2010) proposed other normalization methods for the speech rate, including the coefficient of variation (Varco) and the natural logarithm. Arvantini (2012) introduced amplitude envelope-based rhythm measure based on which she investigated the repetition of acoustic information rather than segmental units.Another application of rhythm measures is in forensic sciences. As the speakers of a typical language have different voices, one of the aspects of forensic sciences is considering different voices between different speakers (Rose, 2004). Dellwo, Leeman, and Kolly (2015) cited three reasons for this diversity: the nature of the articulatory system, linguistic factors, and prosodic factors. Thus, we are faced with a variety of speakers' voices, which is called between-speaker variability. Recently, evidence from various datasets suggested that measuring rhythm based on different phonetic intervals could vary significantly in a language as a function of speakers (Leeman, Kollyand Dellwo, 2014; Wiget et al., 2010; Yoon, 2010). Materials & MethodsTen native speakers of contemporary standard Persian (5 men and 5 women) read a Persian text from the book "North Wind and the Sun" in the acoustic room at Shiraz University. The Persian version of this story contains seven complex sentences. Therefore, the dataset of this test comprised 70 tokens (10 speakers × 7 sentences).This research corpus was acoustically analyzed in Praat (v 6.1.09, in which six tiers of TextGrids were created. In the first tier, the offset and onset of each segment were determined manually and transcribed according to IPA. Then in the second tier, the vowels and consonants were tagged. In the third tier, the vowel and consonants intervals were labeled based on the number of consonants and vowels. In the fourth tier, the vocalic and consonantal intervals were determined. In the fifth layer, the boundary between the existing syllables was tagged manually. Finally, in the sixth tier, the peak of each syllable was automatically identified according to the principle of sonority by a script written by Dellwo[1]. Then, speech rhythm measures from previous works were used. All measures were automatically calculated using the existing script written by Dellwo.The mean and standard deviation of the results obtained from the scripts was calculated in SPSS (v 23) to classify the Persian language rhythm. Moreover, Pearson correlation and one-way ANOVA test were used to distinguish the most robust between-speaker measure. Discussion of Results and ConclusionsThe results confirm that Persian is near to the syllable-based languages. Besides, it was revealed that seven metrics are statistically significant (Speech rate (syl/s), VarcoC, %V, nPVI-V, nPVI-VC, ∆C(ln), ∆Peak(ln)). Based on the present study results, nPVI-VC and V% are the most powerful measures to show the between-speakers variability in Persian. [1] https://www.cl.uzh.ch/de/people/team/phonetics/vdellw.html
Keywords