IEEE Access (Jan 2019)

Scalable Prediction of Service-Level Events in Datacenter Infrastructure Using Deep Neural Networks

  • Alberto Mozo,
  • Itai Segall,
  • Udi Margolin,
  • Sandra Gomez-Canaval

DOI
https://doi.org/10.1109/ACCESS.2019.2956182
Journal volume & issue
Vol. 7
pp. 179779 – 179798

Abstract

Read online

The complexity of cloud datacenter scenarios poses new challenges in infrastructure management processes such as the impracticality of collecting specific service level events from inside the datacenter infrastructure, and the scalability issues that can appear during event monitoring when thousands of virtual machines have to be polled at a granularity of seconds. Therefore, it would be desirable to provide mechanisms for obtaining these types of events without incurring in the previously described problems. To this end, we propose a generic and scalable method based on the application of deep neural network architectures for predicting service level events using only a reduced number of generic datacenter infrastructure statistics that can be monitored in a scalable way. We demonstrate in a controlled scenario of a real datacenter and using only three variables from a physical machine that it is possible to predict events in real-time and with decent accuracy, without needing to deploy any meter in the end-user equipment. Specifically, we demonstrate this over two service-level events: i) the so-called Noisy Neighbors effect, a harmful situation that appears in physical machines due to the interferences created by the interaction of virtual machines running on them; and ii) the jitter values of a multimedia call running in a virtual machine. We set up a testbed in a real datacenter deploying physical and virtual machines, running a large amount of different experiments for 1000 hours and collecting samples at a 10 seconds granularity in a dataset of 260,000 records. Two different scenarios, in which training and testing data sets contain significant statistical differences, are deployed to demonstrate a better generalization ability of deep models in changing scenarios when compared with traditional Machine Learning techniques. A set of different deep architectures are proposed for both use cases and approximately 4,000 deep models were trained and tested. In both use cases, the best deep models show a good performance when predicting service level events, even if the inputs do not exactly follow the statistical patterns of the data used during training.

Keywords