IEEE Access (Jan 2024)

Voice Adversarial Sample Generation Method for Ultrasonicization of Motion Noise

  • Jun Wang,
  • Juan Liu

DOI
https://doi.org/10.1109/ACCESS.2024.3506605
Journal volume & issue
Vol. 12
pp. 177996 – 178009

Abstract

Read online

Recent breakthroughs in deep learning applied to voice processing have sparked an upsurge in security and privacy concerns. This paper presents an inventive adversarial sample generation technique termed the “Ultrasonic Attack,” crafted to covertly steer downstream voice-related tasks – encompassing voice emotion classification, voice synthesis, and voice recognition processes such as biometric authentication used in sporting events. This technique is distinctive in its employment of a multi-feature fitting strategy that allows for precise targeting and alteration of key voice attributes critical for downstream tasks. Ingeniously integrating ultrasonic noise into the original vocal recordings, our method can mislead sophisticated deep learning systems while remaining undetectable to the human ear, leading to erroneous outcomes in voice-based applications. The implications of this are particularly profound in high-stakes situations like athlete identity verification or voice command integrity in sports technology systems. Rigorous experimental validations underscore the “Ultrasonic Attack” as a potent method. When juxtaposed with leading-edge adversarial sample generation techniques, our approach stands out, delivering unrivaled performance in tasks as varied as Voice Emotion Classification and Speaker Identification. Our method triumphs in creating adversarial samples that not only carry out successful attacks with enhanced efficacy but also conserve the natural features of the voice, underscoring the critical need for fortified security in speech-processing technologies.

Keywords