Fault-Tolerant Scheduling Mechanism for Dynamic Edge Computing Scenarios Based on Graph Reinforcement Learning

Yuze Zhang; Geming Xia; Chaodong Yu; Hongcheng Li; Hongfeng Li

doi:10.3390/s24216984

Sensors (Oct 2024)

Fault-Tolerant Scheduling Mechanism for Dynamic Edge Computing Scenarios Based on Graph Reinforcement Learning

Yuze Zhang,
Geming Xia,
Chaodong Yu,
Hongcheng Li,
Hongfeng Li

Affiliations

Yuze Zhang: College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
Geming Xia: College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
Chaodong Yu: College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
Hongcheng Li: College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
Hongfeng Li: College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China

DOI: https://doi.org/10.3390/s24216984
Journal volume & issue: Vol. 24, no. 21
p. 6984

Abstract

Read online

With the proliferation of Internet of Things (IoT) devices and edge nodes, edge computing has taken on much of the real-time data processing and low-latency response tasks which were previously managed by cloud computing. However, edge computing often encounters challenges such as network instability and dynamic resource variations, which can lead to task interruptions or failures. To address these issues, developing a fault-tolerant scheduling mechanism is crucial to ensure that a system continues to operate efficiently even when some nodes experience failures. In this paper, we propose an innovative fault-tolerant scheduling model based on asynchronous graph reinforcement learning. This model incorporates a deep reinforcement learning framework built upon a graph neural network, allowing it to accurately capture the complex communication relationships between computing nodes. The model generates fault-tolerant scheduling actions as output, ensuring robust performance in dynamic environments. Additionally, we introduce an asynchronous model update strategy, which enhances the model’s capability of real-time dynamic scheduling through multi-threaded parallel interactions with the environment and frequent model updates via running threads. The experimental results demonstrate that the proposed method outperformed the baseline algorithms in terms of quality of service (QoS) assurance and fault-tolerant scheduling capabilities.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords