IEEE Access (Jan 2024)

An Attack-Independent Audio Forgery Detection Technique Based on Cochleagram Images of Segments With Dynamic Threshold

  • Beste Ustubioglu

DOI
https://doi.org/10.1109/ACCESS.2024.3409543
Journal volume & issue
Vol. 12
pp. 82660 – 82675

Abstract

Read online

Thanks to advanced audio editing software, speech recordings can be tampered with very quickly. If the speech recordings are used as forensic evidence, adding the audio recordings together, cutting them, and changing their content are legally unacceptable and constitute a crime. Audio copy-move forgery is the most common forgery to change the content of the speech. Audio copy-move forgery is performed by copying a segment in the audio and pasting it anywhere in the same audio. This study proposes a robust and new method based on cochleagram images to detect audio copy-move forgery. The proposed method uses cochleagram images of the voiced parts of the audio to detect forgery clues in the input audio file. For this purpose, the audio file is first split into voiced parts using a pitch-based Voice Activity Detection (VAD) method. Each audio part is then converted into a cochleagram image. Structural similarity index measure (SSIM) is used to calculate the similarity between cochleagram images. After calculating the SSIM values between the cochleagram images, the proposed forgery localization algorithm is performed. In this algorithm, the SSIM values among the cochleagram images are first sorted in descending order. The length ratio between these pairs of segments is calculated to determine which values in this descending order are duplicated segment pairs. If this ratio exceeds the specified percentage rate, these segment pairs are marked as forged segments. Finally, the proposed audio copy-move forgery detection method is evaluated against the state-of-the-art approaches with two Copy-Move Forgery Detection (CMFD) database and forged databases created from TIMIT and the Arabic Speech Corpus database. For Copy-Move Forged Datasets, 95% Precision, 98% Recall and 97% F-score were obtained. The experimental results show that the proposed method is significantly more robust against post-processing operations than other studies.

Keywords