IEEE Access (Jan 2023)
Detecting Fake Audio of Arabic Speakers Using Self-Supervised Deep Learning
Abstract
One of the most significant discussions in forensics is Audio Deepfake, where AI-generated tools are used to clone audio content of people’s voices. Although it was intended to improve people’s lives, attackers utilized it maliciously, compromising the public’s safety. Thus, Machine Learning (ML) and Deep Learning (DL) methods have been developed to detect imitated or synthetically faked voices. However, the developed methods suffered from massive training data or excessive pre-processing. To the author’s best knowledge, Arabic speech has not yet been explored with synthetic fake audio, and it is very limited to the challenged fakeness, which is imitation. This paper proposed a new Audio Deepfake detection method called Arabic-AD based on self-supervised learning techniques to detect both synthetic and imitated voices. Additionally, it contributed to the literature by creating the first synthetic dataset of a single speaker who perfectly speaks Modern Standard Arabic (MSA). Besides, the accent was also considered by collecting Arabic recordings from non-Arabic speakers to evaluate the robustness of Arabic-AD. Three extensive experiments were conducted to measure the proposed method and compare it to well-known benchmarks in the literature. As a result, Arabic-AD outperformed other state-of-the-art methods with the lowest EER rate (0.027%), and high detection accuracy (97%) while avoiding the need for excessive training.
Keywords