Dianxin kexue (Nov 2023)
A method of synthetic speech spoofing detection using constant Q modulation envelope
Abstract
In response to the low accuracy of synthetic speech spoofing detection based on traditional acoustic feature parameters, poor detection performance for unknown types of synthetic speech, and performance degradation in noisy environments, a method for detecting spoofing synthetic speech was proposed using constant Q modulation envelope (CQME) .The motivation of the method was from the fact that the temporal envelope of speech contained abundant information and there was a big difference in detail between the envelope of synthetic speech and genuine speech.The modulation envelope spectrum of speech was obtained by employing constant Q transform (CQT), and the root mean square of each frequency component was calculated to derive the CQME feature vector.And then the CQME feature vector was used to train the random forest classifier for discriminating genuine speech from spoofing synthetic speech.Experimental results demonstrate that the random forest trained with CQME features achieves high detection performance on the ASVspoof 2019 dataset and exhibites good detection efficacy for unknown types of synthetic speech.Furthermore, the proposed method shows high detection performance even under various noise conditions, having excellent noise robustness.