Automatic speech signal segmentation based on the innovation adaptive filter

Makowski Ryszard; Hossa Robert

doi:10.2478/amcs-2014-0019

International Journal of Applied Mathematics and Computer Science (Jun 2014)

Automatic speech signal segmentation based on the innovation adaptive filter

Makowski Ryszard,
Hossa Robert

Affiliations

Makowski Ryszard: Faculty of Electronics Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland
Hossa Robert: Faculty of Electronics Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland

DOI: https://doi.org/10.2478/amcs-2014-0019
Journal volume & issue: Vol. 24, no. 2
pp. 259 – 270

Abstract

Read online

Speech segmentation is an essential stage in designing automatic speech recognition systems and one can ﬁnd several algorithms proposed in the literature. It is a difﬁcult problem, as speech is immensely variable. The aim of the authors’ studies was to design an algorithm that could be employed at the stage of automatic speech recognition. This would make it possible to avoid some problems related to speech signal parametrization. Posing the problem in such a way requires the algorithm to be capable of working in real time. The only such algorithm was proposed by Tyagi et al., (2006), and it is a modiﬁed version of Brandt’s algorithm. The article presents a new algorithm for unsupervised automatic speech signal segmentation. It performs segmentation without access to information about the phonetic content of the utterances, relying exclusively on second-order statistics of a speech signal. The starting point for the proposed method is time-varying Schur coefﬁcients of an innovation adaptive ﬁlter. The Schur algorithm is known to be fast, precise, stable and capable of rapidly tracking changes in second order signal statistics. A transfer from one phoneme to another in the speech signal always indicates a change in signal statistics caused by vocal track changes. In order to allow for the properties of human hearing, detection of inter-phoneme boundaries is performed based on statistics deﬁned on the mel spectrum determined from the reﬂection coefﬁcients. The paper presents the structure of the algorithm, deﬁnes its properties, lists parameter values, describes detection efﬁciency results, and compares them with those for another algorithm. The obtained segmentation results, are satisfactory.

Published in International Journal of Applied Mathematics and Computer Science

ISSN: 2083-8492 (Online)
Publisher: Sciendo
Country of publisher: Poland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.amcs.uz.zgora.pl/

About the journal

Abstract

Keywords