Applied Sciences (May 2024)
Design and Implementation of an Automated Disaster-Recovery System for a Kubernetes Cluster Using LSTM
Abstract
With the increasing importance of data in modern business environments, effective data management and protection strategies are gaining increasing research attention. Data protection in a cloud environment is crucial for safeguarding information assets and maintaining sustainable services. This study introduces a system structure that integrates Kubernetes management platforms with backup and restoration tools. This system is designed to immediately detect disasters and automatically recover applications from another Kubernetes cluster. The experimental results show that this system executes the restoration process within 15 s without human intervention, enabling rapid recovery. This, in turn, significantly reduces the potential for delays and errors compared to manual recovery processes, thereby enhancing data management and recovery efficiency in cloud environments. Moreover, our research model predicts the CPU utilization of the cluster using Long Short-Term Memory (LSTM). The necessity of scheduling through this predict is made clearer through comparison with experiments without scheduling, demonstrating its ability to prevent performance degradation. This research highlights the efficiency and necessity of automatic recovery systems in cloud environments, setting a new direction for future research.
Keywords