Informatics (Aug 2023)
Reinforcement Learning for Reducing the Interruptions and Increasing Fault Tolerance in the Cloud Environment
Abstract
Cloud computing delivers robust computational services by processing tasks on its virtual machines (VMs) using resource-scheduling algorithms. The cloud’s existing algorithms provide limited results due to inappropriate resource scheduling. Additionally, these algorithms cannot process tasks generating faults while being computed. The primary reason for this is that these existing algorithms need an intelligence mechanism to enhance their abilities. To provide an intelligence mechanism to improve the resource-scheduling process and provision the fault-tolerance mechanism, an algorithm named reinforcement learning-shortest job first (RL-SJF) has been implemented by integrating the RL technique with the existing SJF algorithm. An experiment was conducted in a simulation platform to compare the working of RL-SJF with SJF, and challenging tasks were computed in multiple scenarios. The experimental results convey that the RL-SJF algorithm enhances the resource-scheduling process by improving the aggregate cost by 14.88% compared to the SJF algorithm. Additionally, the RL-SJF algorithm provided a fault-tolerance mechanism by computing 55.52% of the total tasks compared to 11.11% of the SJF algorithm. Thus, the RL-SJF algorithm improves the overall cloud performance and provides the ideal quality of service (QoS).
Keywords