IEEE Access (Jan 2025)

PermGuard: A Scalable Framework for Android Malware Detection Using Permission-to-Exploitation Mapping

  • Arvind Prasad,
  • Shalini Chandra,
  • Mueen Uddin,
  • Taher Al-Shehari,
  • Nasser A. Alsadhan,
  • Syed Sajid Ullah

DOI
https://doi.org/10.1109/ACCESS.2024.3523629
Journal volume & issue
Vol. 13
pp. 507 – 528

Abstract

Read online

Android, the world’s most widely used mobile operating system, is increasingly targeted by malware due to its open-source nature, high customizability, and integration with Google services. The increasing reliance on mobile devices significantly raises the risk of malware attacks, especially for non-technical users who often grant permissions without thorough evaluation, leading to potentially devastating effects. This paper introduces PermGuard, a scalable framework for Android malware detection that maps permissions into exploitation techniques and employs incremental learning to detect malicious apps. It presents a novel technique for constructing the PermGuard dataset by mapping Android permissions to exploitation techniques, providing a comprehensive understanding of how permissions can be misused by malware. The dataset consists of 55,911 benign and 55,911 malware apps, providing a balanced and comprehensive foundation for analysis. Additionally, a new strategy using similarity-based selective training reduces the amount of data required for the training of an incremental learning-based model, focusing on the most relevant data to improve efficiency. To ensure robustness and accuracy, the model adopts a test-then-train approach, initially testing on application data to identify weaknesses and refine the training process. The framework’s resilience is tested against adversarial attacks, demonstrating its ability to withstand attempts to bypass or deceive detection mechanisms and enhance overall security. Designed for scalability, PermGuard can handle large and continuously growing datasets, making it suitable for real-world applications. Empirical results indicate that the model achieved an accuracy of 0.9933 on real datasets and 0.9828 on synthetic datasets, demonstrating strong resilience against both real and adversarial attacks.

Keywords