Maximizing accuracy of forced alignment for spontaneous child speech

Joshua Wilson Black; Lynn Clark; Margaret Blackwood; Robert Fromont

doi:10.34842/shrr-sv10

Language Development Research (Sep 2023)

Maximizing accuracy of forced alignment for spontaneous child speech

Joshua Wilson Black,
Lynn Clark,
Margaret Blackwood,
Robert Fromont

Affiliations

Joshua Wilson Black: ORCiD; New Zealand Institute of Language, Brain and Behaviour, University of Canterbury
Lynn Clark: ORCiD; New Zealand Institute of Language, Brain and Behaviour, University of Canterbury
Margaret Blackwood: New Zealand Institute of Language, Brain and Behaviour, University of Canterbury
Robert Fromont: ORCiD; New Zealand Institute of Language, Brain and Behaviour, University of Canterbury

DOI: https://doi.org/10.34842/shrr-sv10
Journal volume & issue: Vol. 3, no. 1

Abstract

Read online Read online

Sociophonetic study of large speech corpora generally requires the use of forced alignment - the automatic process of determining the start and end time of each speech sound within the recording - in order to facilitate large-scale automated extraction of acoustic measurements of targeted vowels or consonants. There is an extensive literature evaluating alignment accuracy of a number of forced alignment tools and procedures, processing speech data from a range of languages and dialects. In general, these evaluations use typical adult speech data, often elicited in a controlled laboratory environment. There is little literature on the effectiveness of forced alignment systems on child speech, and none on speech elicited in field environments. This presents a problem for research at the intersection of language acquisition and sociophonetics as there is no established best practice for automatically aligning child speech. Child speech presents special challenges to automated tools, as it includes more variation in speech sounds and voice quality, and non-standard pronunciation and prosody. We evaluated two toolkits, Kaldi via the Montreal Forced Aligner (MFA), and the Hidden Markov Model Toolkit (HTK), using different configurations to force align non-rhotic child speech elicited in a preschool environment. Against many of our expectations, we found that MFA, using rhotic acoustic models pre-trained on adult speech, performed best. This paper provides a clear methodology for other researchers in sociophonetics to evaluate the success or otherwise of phonetic alignment.

Published in Language Development Research

ISSN: 2771-7976 (Online)
Publisher: Carnegie Mellon University Library Publishing Service
Country of publisher: United States
LCC subjects: Language and Literature: Philology. Linguistics
Website: https://ldr.lps.library.cmu.edu/

About the journal

Abstract

Keywords