Sensors (Oct 2024)

ViTDroid: Vision Transformers for Efficient, Explainable Attention to Malicious Behavior in Android Binaries

  • Toqeer Ali Syed,
  • Mohammad Nauman,
  • Sohail Khan,
  • Salman Jan,
  • Megat F. Zuhairi

DOI
https://doi.org/10.3390/s24206690
Journal volume & issue
Vol. 24, no. 20
p. 6690

Abstract

Read online

Smartphones are intricately connected to the modern society. The two widely used mobile phone operating systems, iOS and Android, profoundly affect the lives of millions of people. Android presently holds a market share of close to 71% among these two. As a result, if personal information is not securely protected, it is at tremendous risk. On the other hand, mobile malware has seen a year-on-year increase of more than 42% globally in 2022 mid-year. Any group of human professionals would have a very tough time detecting and removing all of this malware. For this reason, deep learning in particular has been used recently to overcome this problem. Deep learning models, however, were primarily created for picture analysis. Despite the fact that these models have shown promising findings in the field of vision, it has been challenging to fully comprehend what the characteristics recovered by deep learning models are in the area of malware. Furthermore, the actual potential of deep learning for malware analysis has not yet been fully realized due to the translation invariance trait of well-known models based on CNN. In this paper, we present ViTDroid, a novel model based on vision transformers for the deep learning-based analysis of opcode sequences of Android malware samples from large real-world datasets. We have been able to achieve a false positive rate of 0.0019 as compared to the previous best of 0.0021. However, this incremental improvement is not the major contribution of our work. Our model aims to make explainable predictions, i.e., it not only performs the classification of malware with high accuracy, but it also provides insights into the reasons for this classification. The model is able to pinpoint the malicious behavior-causing instructions in the malware samples. This means that our model can actually aid in the field of malware analysis itself by providing insights to human experts, thus leading to further improvements in this field.

Keywords