Use of Ensemble Learning to Detect Buffer Overflow Exploitation

Ayman Youssef; Mohamed Abdelrazek; Chandan Karmakar

doi:10.1109/ACCESS.2023.3279280

IEEE Access (Jan 2023)

Use of Ensemble Learning to Detect Buffer Overflow Exploitation

Ayman Youssef,
Mohamed Abdelrazek,
Chandan Karmakar

Affiliations

Ayman Youssef: ORCiD; Faculty of Science, Engineering, and Built Environment, School of Information Technology, Deakin University, Melbourne, VIC, Australia
Mohamed Abdelrazek: A2I2D, Applied Aritificial Intelligence Institute, Deakin University, Melbourne, VIC, Australia
Chandan Karmakar: ORCiD; Faculty of Science, Engineering, and Built Environment, School of Information Technology, Deakin University, Melbourne, VIC, Australia

DOI: https://doi.org/10.1109/ACCESS.2023.3279280
Journal volume & issue: Vol. 11
pp. 52009 – 52025

Abstract

Read online

Software exploitation detection remains unresolved problem. Software exploits that target known and unknown vulnerabilities are constantly used in attacks. Signature-based detection techniques are limited to known exploits and susceptible to circumvention. Current research on the use of Machine Learning (ML) for software exploitation detection is limited in quantity and use cases. Existing research lacks the use of public datasets, discussions of feature importance, and elaboration of parameters that affect data preparation and subsequently model performance. This paper presents ML models based on different ensemble algorithms to detect software exploitation using runtime traces. We focus on buffer overflow vulnerabilities in user-space applications within Windows Operating Systems (OS), given the prevalence of the type of vulnerability and the OS. We utilized a publicly available raw dataset of 11 Windows applications under exploitation. Multiple distinct models (based on Random Forest and XGBoost) are created and tested. Testing was performed several times using various aggregation parameters and different testing applications. Our results demonstrate that we can achieve up to 100% recall with 0% false positive rate. We report on the different parameters that must be addressed to curate runtime traces and demonstrate their impact on the performance of the ML models. We demonstrate that the proper training of models on a subset of exploitation techniques enables the model to detect techniques never seen before, such as return-oriented programming. Finally, we conclude with a discussion of the important features that had the highest impact on each of the models, along with the key takeaways.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords