A Bitrate-Scalable Variational Recurrent Mel-Spectrogram Coder for Real-Time Resynthesis-Based Speech Coding

Benjamin Stahl; Simon Windtner; Alois Sontacchi

doi:10.1109/access.2024.3482359

IEEE Access (Jan 2024)

A Bitrate-Scalable Variational Recurrent Mel-Spectrogram Coder for Real-Time Resynthesis-Based Speech Coding

Benjamin Stahl,
Simon Windtner,
Alois Sontacchi

Affiliations

Benjamin Stahl: ORCiD; Institute of Electronic Music and Acoustics, University of Music and Performing Arts Graz, Graz, Austria
Simon Windtner: Institute of Electronic Music and Acoustics, University of Music and Performing Arts Graz, Graz, Austria
Alois Sontacchi: ORCiD; Institute of Electronic Music and Acoustics, University of Music and Performing Arts Graz, Graz, Austria

DOI: https://doi.org/10.1109/access.2024.3482359
Journal volume & issue: Vol. 12
pp. 159239 – 159251

Abstract

Read online

This paper introduces a method for real-time speech coding that combines a binary-latent-vector variational recurrent neural network for mel-spectrogram coding with a non-autoregressive convolutional vocoder for waveform reconstruction. To enable bitrate scalability, we propose a latent vector truncation and padding technique. We evaluate both fixed- and scalable-bitrate variants of the proposed method, comparing them to a baseline vector quantization-based coder. The method is also benchmarked against Opus, Lyra v2, EnCodec, and AudioDec using objective metrics and subjective ratings from a MUSHRA listening test. At 1.38 kbps, the proposed method significantly outperforms Lyra v2 at 3kbps and at 5.51kbps matches its performance at 6kbps. Although AudioDec significantly surpasses the proposed method at 6.4kbps on test data from the TSP speech dataset, the proposed method shows competitive or superior results on withheld speakers from the VCTK dataset. The results show that recurrent coding with binary latent vectors is a viable alternative to prevailing vector quantization-based approaches.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords