Journal of Cloud Computing: Advances, Systems and Applications (Nov 2022)

A malware detection system using a hybrid approach of multi-heads attention-based control flow traces and image visualization

  • Farhan Ullah,
  • Gautam Srivastava,
  • Shamsher Ullah

DOI
https://doi.org/10.1186/s13677-022-00349-8
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Android is the most widely used mobile platform, making it a prime target for malicious attacks. Therefore, it is imperative to effectively circumvent these attacks. Recently, machine learning has been a promising solution for malware detection, which relies on distinguishing features. While machine learning-based malware scanners have a large number of features, adversaries can avoid detection by using feature-related expertise. Therefore, one of the main tasks of the Android security industry is to consistently propose cutting-edge features that can detect suspicious activity. This study presents a novel feature representation approach for malware detection that combines API-Call Graphs (ACGs) with byte-level image representation. First, the reverse engineering procedure is used to obtain the Java programming codes and Dalvik Executable (DEX) file from Android Package Kit (APK). Second, to depict Android apps with high-level features, we develop ACGs by mining API-Calls and API sequences from Control Flow Graph (CFG). The ACGs can act as a digital fingerprint of the actions taken by Android apps. Next, the multi-head attention-based transfer learning method is used to extract trained features vector from ACGs. Third, the DEX file is converted to a malware image, and the texture features are extracted and highlighted using a combination of FAST (Features from Accelerated Segment Test) and BRIEF (Binary Robust Independent Elementary Features). Finally, the ACGs and texture features are combined for effective malware detection and classification. The proposed method uses a customized dataset prepared from the CIC-InvesAndMal2019 dataset and outperforms state-of-the-art methods with 99.27% accuracy.

Keywords