Dianxin kexue (Jun 2023)
Synthetic speech detection method using texture feature based on circumferential local ternary pattern
Abstract
In order to further improve the accuracy of synthetic speech detection, a synthetic speech detection method using texture feature based on circumferential local ternary pattern (CLTP) was proposed.The method extracted the texture information from the speech spectrogram using the CLTP and applied it as the feature representation of speech.The deep residual network was employed as the back-end classifier to determine the real or spoofing speech.The experimental results demonstrate that, on the ASVspoof 2019 dataset, the proposed method reduces the equal error rate (EER) by 54.29% and 2.15% respectively, compared with the traditional constant Q cepstral coefficient (CQCC) and linear predictive cepstral coefficient (LPCC), and reduced the EER by 17.14% compared with the local ternary pattern(LTP) texture features.The CLTP comprehensively takes into account the differences between the central and peripheral pixels in the neighborhood and between each peripheral pixel.Then it can acquire more texture information from the speech spectrogram, and improve the accuracy of synthetic speech detection.