Adversarial Robustness of Vision Transformers Versus Convolutional Neural Networks

Kazim Ali; Muhammad Shahid Bhatti; Atif Saeed; Atifa Athar; Mohammed A. Al Ghamdi; Sultan H. Almotiri; Samina Akram

doi:10.1109/ACCESS.2024.3435347

IEEE Access (Jan 2024)

Adversarial Robustness of Vision Transformers Versus Convolutional Neural Networks

Kazim Ali,
Muhammad Shahid Bhatti,
Atif Saeed,
Atifa Athar,
Mohammed A. Al Ghamdi,
Sultan H. Almotiri,
Samina Akram

Affiliations

Kazim Ali: Punjab Education Department, Government of Punjab, Layyah, Pakistan
Muhammad Shahid Bhatti: ORCiD; Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, Pakistan
Atif Saeed: ORCiD; Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, Pakistan
Atifa Athar: ORCiD; Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, Pakistan
Mohammed A. Al Ghamdi: ORCiD; Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
Sultan H. Almotiri: ORCiD; Department of Cybersecurity, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
Samina Akram: Department of Computer Science, Faculty of Information Technology, University of Central Punjab, Lahore, Pakistan

DOI: https://doi.org/10.1109/ACCESS.2024.3435347
Journal volume & issue: Vol. 12
pp. 105281 – 105293

Abstract

Read online

Vision Transformers (ViTs) have proved to be a more powerful substitute for Convolutional Neural Networks (CNNs) in various computer vision tasks, using the self-attention approach to gain remarkable results and observations. However, the adversarial robustness of ViTs against adversarial attack methods raises critical questions, and the issues of using these models in security-related applications remain under discussion. This paper presents a novel and systematic approach to evaluate and compare the adversarial robustness of ViTs with CNNs, explicitly concentrating on the image classification problem. We have performed extensive experiments using state-of-the-art adversarial example attacks, such as the Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), and DeepFool Attack (DFA). The findings of this research study represent that CNNs are more robust against more straightforward attacks such as FGSM. Still, ViTs show excellent resistance against more dangerous attacks like PGD and DFA attack methods. This work provides useful outcomes revealing the advantages and limitations of CNNs and ViTs, which are helpful for further study and applications regarding safer and more effective use of deep learning models of CNNs and ViTs.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords