Frontiers in Computer Science (Feb 2024)

Phantom in the opera: adversarial music attack for robot dialogue system

  • Sheng Li,
  • Jiyi Li,
  • Yang Cao

DOI
https://doi.org/10.3389/fcomp.2024.1355975
Journal volume & issue
Vol. 6

Abstract

Read online

This study explores the vulnerability of robot dialogue systems' automatic speech recognition (ASR) module to adversarial music attacks. Specifically, we explore music as a natural camouflage for such attacks. We propose a novel method to hide ghost speech commands in a music clip by slightly perturbing its raw waveform. We apply our attack on an industry-popular ASR model, namely the time-delay neural network (TDNN), widely used for speech and speaker recognition. Our experiment demonstrates that adversarial music crafted by our attack can easily mislead industry-level TDNN models into picking up ghost commands with high success rates. However, it sounds no different from the original music to the human ear. This reveals a serious threat by adversarial music to robot dialogue systems, calling for effective defenses against such stealthy attacks.

Keywords