IEEE Access (Jan 2018)

Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks

  • Ruben Zazo,
  • Phani Sankar Nidadavolu,
  • Nanxin Chen,
  • Joaquin Gonzalez-Rodriguez,
  • Najim Dehak

DOI
https://doi.org/10.1109/ACCESS.2018.2816163
Journal volume & issue
Vol. 6
pp. 22524 – 22530

Abstract

Read online

Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.

Keywords