ViTDroid: Vision Transformers for Efficient, Explainable Attention to Malicious Behavior in Android Binaries

Toqeer Ali Syed; Mohammad Nauman; Sohail Khan; Salman Jan; Megat F. Zuhairi

doi:10.3390/s24206690

Sensors (Oct 2024)

ViTDroid: Vision Transformers for Efficient, Explainable Attention to Malicious Behavior in Android Binaries

Toqeer Ali Syed,
Mohammad Nauman,
Sohail Khan,
Salman Jan,
Megat F. Zuhairi

Affiliations

Toqeer Ali Syed: Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah 42351, Saudi Arabia
Mohammad Nauman: Department of Computer Science, Effat College of Engineering, Effat University, Jeddah 22332, Saudi Arabia
Sohail Khan: Department of Computer Science, Effat College of Engineering, Effat University, Jeddah 22332, Saudi Arabia
Salman Jan: Department of Information Technology, Alburaimi University College, Alburaimi 512, Oman
Megat F. Zuhairi: Malaysian Institute of Information Technology, Universiti Kuala Lumpur, Kuala Lumpur 50250, Malaysia

DOI: https://doi.org/10.3390/s24206690
Journal volume & issue: Vol. 24, no. 20
p. 6690

Abstract

Read online

Smartphones are intricately connected to the modern society. The two widely used mobile phone operating systems, iOS and Android, profoundly affect the lives of millions of people. Android presently holds a market share of close to 71% among these two. As a result, if personal information is not securely protected, it is at tremendous risk. On the other hand, mobile malware has seen a year-on-year increase of more than 42% globally in 2022 mid-year. Any group of human professionals would have a very tough time detecting and removing all of this malware. For this reason, deep learning in particular has been used recently to overcome this problem. Deep learning models, however, were primarily created for picture analysis. Despite the fact that these models have shown promising findings in the field of vision, it has been challenging to fully comprehend what the characteristics recovered by deep learning models are in the area of malware. Furthermore, the actual potential of deep learning for malware analysis has not yet been fully realized due to the translation invariance trait of well-known models based on CNN. In this paper, we present ViTDroid, a novel model based on vision transformers for the deep learning-based analysis of opcode sequences of Android malware samples from large real-world datasets. We have been able to achieve a false positive rate of 0.0019 as compared to the previous best of 0.0021. However, this incremental improvement is not the major contribution of our work. Our model aims to make explainable predictions, i.e., it not only performs the classification of malware with high accuracy, but it also provides insights into the reasons for this classification. The model is able to pinpoint the malicious behavior-causing instructions in the malware samples. This means that our model can actually aid in the field of malware analysis itself by providing insights to human experts, thus leading to further improvements in this field.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords