Sensors (Oct 2024)
Fault-Tolerant Scheduling Mechanism for Dynamic Edge Computing Scenarios Based on Graph Reinforcement Learning
Abstract
With the proliferation of Internet of Things (IoT) devices and edge nodes, edge computing has taken on much of the real-time data processing and low-latency response tasks which were previously managed by cloud computing. However, edge computing often encounters challenges such as network instability and dynamic resource variations, which can lead to task interruptions or failures. To address these issues, developing a fault-tolerant scheduling mechanism is crucial to ensure that a system continues to operate efficiently even when some nodes experience failures. In this paper, we propose an innovative fault-tolerant scheduling model based on asynchronous graph reinforcement learning. This model incorporates a deep reinforcement learning framework built upon a graph neural network, allowing it to accurately capture the complex communication relationships between computing nodes. The model generates fault-tolerant scheduling actions as output, ensuring robust performance in dynamic environments. Additionally, we introduce an asynchronous model update strategy, which enhances the model’s capability of real-time dynamic scheduling through multi-threaded parallel interactions with the environment and frequent model updates via running threads. The experimental results demonstrate that the proposed method outperformed the baseline algorithms in terms of quality of service (QoS) assurance and fault-tolerant scheduling capabilities.
Keywords