IEEE Access (Jan 2024)

Enhancing Cybersecurity With P-Code Analysis and XGBoost: A Novel Approach for Malicious VBA Macro Detection in Office Documents

  • Candra Ahmadi,
  • Jiann-Liang Chen,
  • Yi-Cheng Lai

DOI
https://doi.org/10.1109/ACCESS.2024.3402956
Journal volume & issue
Vol. 12
pp. 71746 – 71760

Abstract

Read online

In the evolving landscape of cybersecurity, the prevalence of malicious Visual Basic for Applications (VBA) macros embedded in Office documents presents a formidable challenge. These macros, while integral to automation, have become potent vehicles for cyber-attacks, necessitating advanced detection techniques. This study introduces a comprehensive framework employing P-Code Analysis and XGBoost, a leading-edge machine learning algorithm, to address this issue. The proposed solution synergizes static analysis of VBA source code with dynamic P-Code structural analysis, enhanced by Natural Language Processing (NLP) techniques for effective feature extraction. By integrating these methodologies, our model adeptly distinguishes between benign and malicious macros, achieving an unprecedented detection accuracy of 98.70% and an F1-score of 98.81% in rigorous testing environments. The core contribution of this research lies in its innovative approach to malicious macro detection, offering a robust framework that significantly improves upon existing methods. Additionally, the utilization of XGBoost for machine learning analysis introduces a novel application in cybersecurity defenses against macro-based threats. The results underscore the efficacy of combining P-Code analysis with machine learning for cybersecurity, marking a significant stride in the detection of sophisticated cyber threats. This study not only advances the domain of cybersecurity but also lays the groundwork for future research, advocating for the exploration of further optimizations and the adaptation of our model to combat evolving attack vectors. Recommended terms: Cybersecurity, Malicious VBA Macro Detection, P-Code Analysis, XGBoost, Machine Learning.

Keywords