IEEE Access (Jan 2023)
Log-Spectral Amplitude and Spectral Polarity Estimation in Short-Time Discrete Cosine Transform Domain
Abstract
Single-channel speech enhancement based on short-time spectral amplitude (STSA) estimation often uses the unmodified phase spectrum for speech re-synthesis, thereby introducing undesired artifacts to the enhanced speech. Using discrete Cosine transform (DCT) instead of discrete Fourier transform (DFT) reduces the effects of such issues because the consequences of using noisy DCT polarities for speech re-synthesis are less severe than using the noisy DFT phases. Although DFT-based STSA estimators have been adequately studied in the past, such estimators have not sufficiently been developed for the DCT domain. This study aims to demonstrate the superiority of DCT representation in STSA estimation-based speech enhancement. To achieve this, we first derive the DCT-based STSA estimator which minimizes the mean squared error (MSE) of the log-spectral amplitudes (LSA). We then propose a novel DCT polarity estimator to be used in combination with the STSA estimator. The clean speech DCT coefficients are modeled by a Gaussian or a Laplace density and the noise DCT coefficients are modeled by a Gaussian density. To assess the enhanced speech, objective and subjective quality measures are employed. Results show that the new estimators performed better and are widely preferred by listeners over the corresponding DFT-based estimators. Moreover, the proposed STSA estimators can be expressed in the closed-form, whereas the DFT-based estimator with super-Gaussian speech prior has no closed-form solutions.
Keywords