Multi Pattern Features-Based Spoofing Detection Mechanism Using One Class Learning

Beste Ustubioglu; Gul Tahaoglu; Arda Ustubioglu; Guzin Ulutas; Irene Amerini; Muhammed Kilic

doi:10.1109/ACCESS.2024.3447572

IEEE Access (Jan 2024)

Multi Pattern Features-Based Spoofing Detection Mechanism Using One Class Learning

Beste Ustubioglu,
Gul Tahaoglu,
Arda Ustubioglu,
Guzin Ulutas,
Irene Amerini,
Muhammed Kilic

Affiliations

Beste Ustubioglu: ORCiD; Computer Engineering Department, Karadeniz Technical University, Trabzon, Türkiye
Gul Tahaoglu: ORCiD; Computer Engineering Department, Karadeniz Technical University, Trabzon, Türkiye
Arda Ustubioglu: ORCiD; Department of Management Information Systems, Trabzon University, Trabzon, Türkiye
Guzin Ulutas: ORCiD; Computer Engineering Department, Karadeniz Technical University, Trabzon, Türkiye
Irene Amerini: ORCiD; Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
Muhammed Kilic: ORCiD; Computer Engineering Department, Karadeniz Technical University, Trabzon, Türkiye

DOI: https://doi.org/10.1109/ACCESS.2024.3447572
Journal volume & issue: Vol. 12
pp. 117523 – 117540

Abstract

Read online

Automatic Speaker Verification systems are prone to various voice spoofing attacks such as replays, voice conversion (VC) and speech synthesis. Malicious users can perform specific tasks such as controlling the bank account of someone, taking control of a smart home, and similar activities, by using advanced audio manipulation techniques. This study presents a Multi-Pattern Features Based Spoofing detection mechanism using the modified ResNet architecture and OC-Softmax layer to detect various LA and PA spoofing attacks. We proposed a novel Pattern features-based audio spoof detection scheme. The scheme contains three branches to evaluate different patterns on a Mel spectrogram of the audio file. This is the first work for the audio spoofing detection task using three different pattern representations of Mel spectrogram with modified ResNet architecture and OC-Softmax layer. Through the proposed network, we can extract pattern images from the Mel spectrogram and gives each of them into modified ResNet architecture. At the last step of each network, we use OC-Softmax to obtain a score for the current pattern image and then the method fuses three scores to label the input audio. Experimental results on the ASVspoof 2019 and ASVspoof 2021 corpuses show that the proposed method achieves better results in the challenges of ASVspoof 2019 than state-of-the-art methods. For example, in the logical access scenario, our model improves the tandem decision cost function and equal error rate scores by 0.06% and 2.14%, respectively, compared with state-of-the-art methods. Additionally, experiments illustrate that the proposed fused decision improved the performance of the system.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords