Applied Sciences (Jun 2024)

Pyramid Feature Attention Network for Speech Resampling Detection

  • Xinyu Zhou,
  • Yujin Zhang,
  • Yongqi Wang,
  • Jin Tian,
  • Shaolun Xu

DOI
https://doi.org/10.3390/app14114803
Journal volume & issue
Vol. 14, no. 11
p. 4803

Abstract

Read online

Speech forgery and tampering, increasingly facilitated by advanced audio editing software, pose significant threats to the integrity and privacy of digital speech avatars. Speech resampling is a post-processing operation of various speech-tampering means, and the forensic detection of speech resampling is of great significance. For speech resampling detection, most of the previous works used traditional methods of feature extraction and classification to distinguish original speech from forged speech. In view of the powerful ability of deep learning to extract features, this paper converts the speech signal into a spectrogram with time-frequency characteristics, and uses the feature pyramid network (FPN) with the Squeeze and Excitation (SE) attention mechanism to learn speech resampling features. The proposed method combines the low-level location information and the high-level semantic information, which dramatically improves the detection performance of speech resampling. Experiments were carried out on a resampling corpus made on the basis of the TIMIT dataset. The results indicate that the proposed method significantly improved the detection accuracy of various resampled speech. For the tampered speech with a resampling factor of 0.9, the detection accuracy is increased by nearly 20%. In addition, the robustness test demonstrates that the proposed model has strong resistance to MP3 compression, and the overall performance is better than the existing methods.

Keywords