Journal of Universal Computer Science (Jan 2025)
Fault Tolerance Model for Hadoop Distributed System
Abstract
Read online Read online Read online
Fault tolerance approaches in distributed systems are essentially based on replication and checkpointing. Each of these approaches has its advantages and limitations. This paper has two objectives: first, it proposes a fault tolerance approach based on the nodes status of a distributed system. For this purpose, it defines 3 nodes status: safety, faulty and potentially faulty. With respect of classical node status (safety, faulty), it introduces a new status that we call potentially faulty. This last node allows to enhance the availability of a distributed system. Second, it discusses the efficiency of the proposed model on two types of architectures: virtual multi-node cluster and a physical multi-node cluster with WIFI connection. Experiments have showed that proposed approach increases the system performance throughput and its fault tolerance level.
Keywords