Transactions on Fuzzy Sets and Systems (Nov 2023)

‎An Artificial Intelligence Framework for Supporting Coarse-Grained Workload Classification in Complex Virtual Environments

  • Alfredo Cuzzocrea,
  • Enzo Mumolo,
  • Islam Belmerabet,
  • Abderraouf Hafsaoui

DOI
https://doi.org/10.30495/tfss.2023.1990572.1079
Journal volume & issue
Vol. 2, no. 2
pp. 155 – 183

Abstract

Read online

Cloud-based machine learning tools for enhanced Big Data applications}‎, ‎where the main idea is that of predicting the ``\emph{next}'' \emph{workload} occurring against the target Cloud infrastructure via an innovative \emph{ensemble-based approach} that combines the effectiveness of different well-known \emph{classifiers} in order to enhance the whole accuracy of the final classification‎, ‎which is very relevant at now in the specific context of \emph{Big Data}‎. ‎The so-called \emph{workload categorization problem} plays a critical role in improving the efficiency and reliability of Cloud-based big data applications‎. ‎Implementation-wise‎, ‎our method proposes deploying Cloud entities that participate in the distributed classification approach on top of \emph{virtual machines}‎, ‎which represent classical ``commodity'' settings for Cloud-based big data applications‎. ‎Given a number of known reference workloads‎, ‎and an unknown workload‎, ‎in this paper we deal with the problem of finding the reference workload which is most similar to the unknown one‎. ‎The depicted scenario turns out to be useful in a plethora of modern information system applications‎. ‎We name this problem as \emph{coarse-grained workload classification}‎, ‎because‎, ‎instead of characterizing the unknown workload in terms of finer behaviors‎, ‎such as CPU‎, ‎memory‎, ‎disk‎, ‎or network intensive patterns‎, ‎we classify the whole unknown workload as one of the (possible) reference workloads‎. ‎Reference workloads represent a category of workloads that are relevant in a given applicative environment‎. ‎In particular‎, ‎we focus our attention on the classification problem described above in the special case represented by \emph{virtualized environments}‎. ‎Today‎, ‎\emph{Virtual Machines} (VMs) have become very popular because they offer important advantages to modern computing environments such as cloud computing or server farms‎. ‎In virtualization frameworks‎, ‎workload classification is very useful for accounting‎, ‎security reasons‎, ‎or user profiling‎. ‎Hence‎, ‎our research makes more sense in such environments‎, ‎and it turns out to be very useful in a special context like Cloud Computing‎, ‎which is emerging now‎. ‎In this respect‎, ‎our approach consists of running several machine learning-based classifiers of different workload models‎, ‎and then deriving the best classifier produced by the \emph{Dempster-Shafer Fusion}‎, ‎in order to magnify the accuracy of the final classification‎. ‎Experimental assessment and analysis clearly confirm the benefits derived from our classification framework‎. ‎The running programs which produce unknown workloads to be classified are treated in a similar way‎. ‎A fundamental aspect of this paper concerns the successful use of data fusion in workload classification‎. ‎Different types of metrics are in fact fused together using the Dempster-Shafer theory of evidence combination‎, ‎giving a classification accuracy of slightly less than $80\%$‎. ‎The acquisition of data from the running process‎, ‎the pre-processing algorithms‎, ‎and the workload classification are described in detail‎. ‎Various classical algorithms have been used for classification to classify the workloads‎, ‎and the results are compared‎.

Keywords