Streaming ASR Encoder for Whisper-to-Speech Online Voice Conversion

Anastasia Avdeeva; Aleksei Gusev; Tseren Andzhukaev; Artem Ivanov

doi:10.1109/OJSP.2023.3343342

IEEE Open Journal of Signal Processing (Jan 2024)

Streaming ASR Encoder for Whisper-to-Speech Online Voice Conversion

Anastasia Avdeeva,
Aleksei Gusev,
Tseren Andzhukaev,
Artem Ivanov

Affiliations

Anastasia Avdeeva: ORCiD; FluentaAI, Wilmington, DE, USA
Aleksei Gusev: ORCiD; FluentaAI, Wilmington, DE, USA
Tseren Andzhukaev: ORCiD; FluentaAI, Wilmington, DE, USA
Artem Ivanov: ORCiD; FluentaAI, Wilmington, DE, USA

DOI: https://doi.org/10.1109/OJSP.2023.3343342
Journal volume & issue: Vol. 5
pp. 160 – 167

Abstract

Read online

Whispered speech is a quiet voice without vocalization. One of the common cases of using whispered speech is a technique that can help overcome stuttering. But whispered speech can be uncomfortable and difficult to understand in everyday communication. To address these problems, we propose a method of low-delayed whisper-to-speech voice conversion, which can be useful in real life communication of people with disordered speech. As part of our research, we study the impact of streaming Automatic Speech Recognition models on the quality of voice conversion, comparing different streaming models and methods for model adaptation to streaming settings, and showing the importance of using such models in cases of low-delayed voice conversion.

Published in IEEE Open Journal of Signal Processing

ISSN: 2644-1322 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8782710

About the journal

Abstract

Keywords