Imitation Game for Adversarial Disillusion With Chain-of-Thought Reasoning in Generative AI

Ching-Chun Chang; Fan-Yun Chen; Shih-Hong Gu; Kai Gao; Hanrui Wang; Isao Echizen

doi:10.1109/access.2025.3574016

IEEE Access (Jan 2025)

Imitation Game for Adversarial Disillusion With Chain-of-Thought Reasoning in Generative AI

Ching-Chun Chang,
Fan-Yun Chen,
Shih-Hong Gu,
Kai Gao,
Hanrui Wang,
Isao Echizen

Affiliations

Ching-Chun Chang: ORCiD; Information and Society Research Division, National Institute of Informatics, Tokyo, Japan
Fan-Yun Chen: Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
Shih-Hong Gu: Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
Kai Gao: ORCiD; Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
Hanrui Wang: ORCiD; Information and Society Research Division, National Institute of Informatics, Tokyo, Japan
Isao Echizen: ORCiD; Information and Society Research Division, National Institute of Informatics, Tokyo, Japan

DOI: https://doi.org/10.1109/access.2025.3574016
Journal volume & issue: Vol. 13
pp. 95085 – 95093

Abstract

Read online

As the cornerstone of artificial intelligence, machine perception confronts a fundamental threat posed by adversarial illusions. These adversarial attacks manifest in two primary forms: deductive illusion, where specific stimuli are crafted based on the victim model’s general decision logic, and inductive illusion, where the victim model’s general decision logic is shaped by specific stimuli. The former exploits the model’s decision boundaries to create a stimulus that, when applied, interferes with its decision-making process. The latter reinforces a conditioned reflex in the model, embedding a backdoor during its learning phase that, when triggered by a stimulus, causes aberrant behaviors. The multifaceted nature of adversarial illusions calls for a unified defence framework, addressing vulnerabilities across various forms of attack. In this study, we propose a disillusion paradigm based on the concept of an imitation game. At the heart of the imitation game lies a multimodal generative agent, steered by chain-of-thought reasoning, which observes, internalizes and reconstructs the semantic essence of a sample, liberated from the classic pursuit of reversing the sample to its original state. As a proof of concept, we conduct experimental simulations using a multimodal generative dialogue agent and evaluates the methodology under a variety of attack scenarios. Experimental results demonstrate that the proposed framework consistently neutralizes both deductive and inductive adversarial illusions across diverse white-box and black-box attack scenarios.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords