IEEE Open Journal of Signal Processing (Jan 2024)
Streaming ASR Encoder for Whisper-to-Speech Online Voice Conversion
Abstract
Whispered speech is a quiet voice without vocalization. One of the common cases of using whispered speech is a technique that can help overcome stuttering. But whispered speech can be uncomfortable and difficult to understand in everyday communication. To address these problems, we propose a method of low-delayed whisper-to-speech voice conversion, which can be useful in real life communication of people with disordered speech. As part of our research, we study the impact of streaming Automatic Speech Recognition models on the quality of voice conversion, comparing different streaming models and methods for model adaptation to streaming settings, and showing the importance of using such models in cases of low-delayed voice conversion.
Keywords