Theoretical learning guarantees applied to acoustic modeling

Christopher D. Shulby; Martha D. Ferreira; Rodrigo F. de Mello; Sandra M. Aluisio

doi:10.1186/s13173-018-0081-3

Journal of the Brazilian Computer Society (Jan 2019)

Theoretical learning guarantees applied to acoustic modeling

Christopher D. Shulby,
Martha D. Ferreira,
Rodrigo F. de Mello,
Sandra M. Aluisio

Affiliations

Christopher D. Shulby: Samsung SIDI Institute, Rua Aguaçu
Martha D. Ferreira: Institute of Mathematical and Computer Sciences, University of São Paulo
Rodrigo F. de Mello: Institute of Mathematical and Computer Sciences, University of São Paulo
Sandra M. Aluisio: Institute of Mathematical and Computer Sciences, University of São Paulo

DOI: https://doi.org/10.1186/s13173-018-0081-3
Journal volume & issue: Vol. 25, no. 1
pp. 1 – 12

Abstract

Read online

Abstract In low-resource scenarios, for example, small datasets or a lack in computational resources available, state-of-the-art deep learning methods for speech recognition have been known to fail. It is possible to achieve more robust models if care is taken to ensure the learning guarantees provided by the statistical learning theory. This work presents a shallow and hybrid approach using a convolutional neural network feature extractor fed into a hierarchical tree of support vector machines for classification. Here, we show that gross errors present even in state-of-the-art systems can be avoided and that an accurate acoustic model can be built in a hierarchical fashion. Furthermore, we present proof that our algorithm does adhere to the learning guarantees provided by the statistical learning theory. The acoustic model produced in this work outperforms traditional hidden Markov models, and the hierarchical support vector machine tree outperforms a multi-class multilayer perceptron classifier using the same features. More importantly, we isolate the performance of the acoustic model and provide results on both the frame and phoneme level, considering the true robustness of the model. We show that even with a small amount of data, accurate and robust recognition rates can be obtained.

Published in Journal of the Brazilian Computer Society

ISSN: 0104-6500 (Print); 1678-4804 (Online)
Publisher: Sociedade Brasileira de Computação
Country of publisher: Brazil
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://sol.sbc.org.br/journals/index.php/jbcs/

About the journal

Abstract

Keywords