IEEE Access (Jan 2021)

Recurrent Neural Networks Based Online Behavioural Malware Detection Techniques for Cloud Infrastructure

  • Jeffrey C. Kimmel,
  • Andrew D. Mcdole,
  • Mahmoud Abdelsalam,
  • Maanak Gupta,
  • Ravi Sandhu

DOI
https://doi.org/10.1109/ACCESS.2021.3077498
Journal volume & issue
Vol. 9
pp. 68066 – 68080

Abstract

Read online

Several organizations are utilizing cloud technologies and resources to run a range of applications. These services help businesses save on hardware management, scalability and maintainability concerns of underlying infrastructure. Key cloud service providers (CSPs) like Amazon, Microsoft and Google offer Infrastructure as a Service (IaaS) to meet the growing demand of such enterprises. This increased utilization of cloud platforms has made it an attractive target to the attackers, thereby, making the security of cloud services a top priority for CSPs. In this respect, malware has been recognized as one of the most dangerous and destructive threats to cloud infrastructure (IaaS). In this paper, we study the effectiveness of Recurrent Neural Networks (RNNs) based deep learning techniques for detecting malware in cloud Virtual Machines (VMs). We focus on two major RNN architectures: Long Short Term Memory RNNs (LSTMs) and Bidirectional RNNs (BIDIs). These models learn the behavior of malware over time based on run-time fine-grained processes system features such as CPU, memory, and disk utilization. We evaluate our approach on a dataset of 40,680 malicious and benign samples. The process level features were collected using real malware running in an open online cloud environment with no restrictions, which is important to emulate practical cloud provider settings and also capture the true behaviour of stealth and sophisticated malware. Both our LSTM and BIDI models achieve high detection rates over 99% for different evaluation metrics. In addition, an analysis study is conducted to understand the significance of input data representations. Our results suggest that in particular cases, input ordering does have some affect on the performance of the trained RNN models.

Keywords