IEEE Access (Jan 2023)

Experiences With Deep Learning Enhanced Steering Mechanisms for Debugging of Fundamental Cloud Services

  • Robert Lovas,
  • Erno Rigo,
  • Daniel Unyi,
  • Balint Gyires-Toth

DOI
https://doi.org/10.1109/ACCESS.2023.3243201
Journal volume & issue
Vol. 11
pp. 26403 – 26418

Abstract

Read online

Cloud architecture blueprints or reference architectures allow the reuse of existing knowledge and best practices when creating new cloud native solutions. Therefore, debugging of reference architecture candidates (or their new versions) is an extremely crucial but tedious and time-consuming task due to the deployment of complex services in typical multi-tenant and non-deterministic environments. During the debugging/testing/maintenance scenarios, we might be able to achieve greater levels of test coverage (and eventually improved reliability) by modelling and verifying at least their most fundamental building blocks and their interconnections. The main objective of our work is to integrate stochastic modelling and verification techniques based on deep learning methods into the debugging cycle in order to handle large state spaces more efficiently, i.e. by steering the process of traversing state space towards suspicious situations that may result in potential bugs in the actual system with smart steering during the traversal. For this purpose, our presented and illustrated approach combines (among others) Continuous Time Markov Chain modelling (CTMC) techniques with deep learning methods including autoencoder, Long Short-Term Memory (LSTM) and Graph Neural Network (GNN) models. Our experiences are summarized with widespread cloud design patterns including load balancing and service mesh topologies. According to the results, the debugging cycle can be partly automated through the application of deep learning methods. The autoencoders are able to detect erroneous load balancer behaviors (anomalies) in complex configurations; the LSTMs demonstrate implicitly some random nature of the inspected processes, and GNNs exploit the additional topology-related information in service meshes.

Keywords