Transactions on Fuzzy Sets and Systems (Nov 2023)
An Artificial Intelligence Framework for Supporting Coarse-Grained Workload Classification in Complex Virtual Environments
Abstract
Cloud-based machine learning tools for enhanced Big Data applications}, where the main idea is that of predicting the ``\emph{next}'' \emph{workload} occurring against the target Cloud infrastructure via an innovative \emph{ensemble-based approach} that combines the effectiveness of different well-known \emph{classifiers} in order to enhance the whole accuracy of the final classification, which is very relevant at now in the specific context of \emph{Big Data}. The so-called \emph{workload categorization problem} plays a critical role in improving the efficiency and reliability of Cloud-based big data applications. Implementation-wise, our method proposes deploying Cloud entities that participate in the distributed classification approach on top of \emph{virtual machines}, which represent classical ``commodity'' settings for Cloud-based big data applications. Given a number of known reference workloads, and an unknown workload, in this paper we deal with the problem of finding the reference workload which is most similar to the unknown one. The depicted scenario turns out to be useful in a plethora of modern information system applications. We name this problem as \emph{coarse-grained workload classification}, because, instead of characterizing the unknown workload in terms of finer behaviors, such as CPU, memory, disk, or network intensive patterns, we classify the whole unknown workload as one of the (possible) reference workloads. Reference workloads represent a category of workloads that are relevant in a given applicative environment. In particular, we focus our attention on the classification problem described above in the special case represented by \emph{virtualized environments}. Today, \emph{Virtual Machines} (VMs) have become very popular because they offer important advantages to modern computing environments such as cloud computing or server farms. In virtualization frameworks, workload classification is very useful for accounting, security reasons, or user profiling. Hence, our research makes more sense in such environments, and it turns out to be very useful in a special context like Cloud Computing, which is emerging now. In this respect, our approach consists of running several machine learning-based classifiers of different workload models, and then deriving the best classifier produced by the \emph{Dempster-Shafer Fusion}, in order to magnify the accuracy of the final classification. Experimental assessment and analysis clearly confirm the benefits derived from our classification framework. The running programs which produce unknown workloads to be classified are treated in a similar way. A fundamental aspect of this paper concerns the successful use of data fusion in workload classification. Different types of metrics are in fact fused together using the Dempster-Shafer theory of evidence combination, giving a classification accuracy of slightly less than $80\%$. The acquisition of data from the running process, the pre-processing algorithms, and the workload classification are described in detail. Various classical algorithms have been used for classification to classify the workloads, and the results are compared.
Keywords