ZDDR: A Zero-Shot Defender for Adversarial Samples Detection and Restoration

Musheng Chen; Guowei He; Junhua Wu

doi:10.1109/ACCESS.2024.3356568

IEEE Access (Jan 2024)

ZDDR: A Zero-Shot Defender for Adversarial Samples Detection and Restoration

Musheng Chen,
Guowei He,
Junhua Wu

Affiliations

Musheng Chen: ORCiD; School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China
Guowei He: ORCiD; School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China
Junhua Wu: ORCiD; School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China

DOI: https://doi.org/10.1109/ACCESS.2024.3356568
Journal volume & issue: Vol. 12
pp. 39081 – 39094

Abstract

Read online

Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on supervised detection techniques, which makes them vulnerable to issues arising from training data quality, inherent biases, noise, or adversarial inputs. This study observed common compromises in sentence fluency during aggression. On this basis, the Zero Sample Defender (ZDDR) is introduced for adversarial sample detection and recovery without relying on prior knowledge. ZDDR combines the log probability calculated by the model and the syntactic normative score of a large language model (LLM) to detect adversarial examples. Furthermore, using strategic prompts, ZDDR guides LLM in rephrasing adversarial content, maintaining clarity, structure, and meaning, thereby restoring the sentence from the attack. Benchmarking reveals a 9% improvement in area under receiver operating characteristic curve (AUROC) for adversarial detection over existing techniques. Post-restoration, model classification efficacy surges by 45% compared to the offensive inputs, setting new performance standards against other restoration techniques.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords