IEEE Access (Jan 2022)

Malware Detection: A Framework for Reverse Engineered Android Applications Through Machine Learning Algorithms

  • Beenish Urooj,
  • Munam Ali Shah,
  • Carsten Maple,
  • Muhammad Kamran Abbasi,
  • Sidra Riasat

DOI
https://doi.org/10.1109/ACCESS.2022.3149053
Journal volume & issue
Vol. 10
pp. 89031 – 89050

Abstract

Read online

Today, Android is one of the most used operating systems in smartphone technology. This is the main reason, Android has become the favorite target for hackers and attackers. Malicious codes are being embedded in Android applications in such a sophisticated manner that detecting and identifying an application as a malware has become the toughest job for security providers. In terms of ingenuity and cognition, Android malware has progressed to the point where they’re more impervious to conventional detection techniques. Approaches based on machine learning have emerged as a much more effective way to tackle the intricacy and originality of developing Android threats. They function by first identifying current patterns of malware activity and then using this information to distinguish between identified threats and unidentified threats with unknown behavior. This research paper uses Reverse Engineered Android applications’ features and Machine Learning algorithms to find vulnerabilities present in Smartphone applications. Our contribution is twofold. Firstly, we propose a model that incorporates more innovative static feature sets with the largest current datasets of malware samples than conventional methods. Secondly, we have used ensemble learning with machine learning algorithms i.e., AdaBoost, Support Vector Machine (SVM), etc. to improve our model’s performance. Our experimental results and findings exhibit 96.24% accuracy to detect extracted malware from Android applications, with a 0.3 False Positive Rate (FPR). The proposed model incorporates ignored detrimental features such as permissions, intents, Application Programming Interface (API) calls, and so on, trained by feeding a solitary arbitrary feature, extracted by reverse engineering as an input to the machine.

Keywords