High-Confidence Computing (Dec 2023)

Machine learning job failure analysis and prediction model for the cloud environment

  • Harikrishna Bommala,
  • Uma Maheswari V.,
  • Rajanikanth Aluvalu,
  • Swapna Mudrakola

Journal volume & issue
Vol. 3, no. 4
p. 100165

Abstract

Read online

Reliable and accessible cloud applications are essential for the future of ubiquitous computing, smart appliances, and electronic health. Owing to the vastness and diversity of the cloud, a most cloud services, both physical and logical services have failed. Using currently accessible traces, we assessed and characterized the behaviors of successful and unsuccessful activities. We devised and implemented a method to forecast which jobs will fail. The proposed method optimizes cloud applications more efficiently in terms of resource usage. Using Google Cluster, Mustang, and Trinity traces, which are publicly available, an in-depth evaluation of the proposed model was conducted. The traces were also fed into several different machine learning models to select the most reliable model. Our efficiency analysis proves that the model performs well in terms of accuracy, F1-score, and recall. Several factors, such as failure of forecasting work, design of scheduling algorithms, modification of priority criteria, and restriction of task resubmission, may increase cloud service dependability and availability.

Keywords