International Journal of Cognitive Computing in Engineering (Dec 2025)
Enhanced priority based task scheduling with integrated fault tolerance in distributed systems
Abstract
Fault tolerance in task scheduling is a fundamental challenge in cloud computing, particularly in dynamic environments where failures can disrupt task execution and degrade performance. Existing scheduling procedures lack effective mechanisms to adjust task priorities leading to suboptimal performance vigorously. Moreover, Fault Tolerance (FT) is regularly addressed either through replication or checkpointing, but rarely in a synergistic manner. While Replication and checkpointing provide several strengths. However, these also suffer from significant weaknesses when used in isolation. Replication increases cost and resource usage while Checkpointing suffers from frequent storage overheads. This research addresses these limitations by proposing the dynamic priority-based task and integrated fault-tolerance method that combines the strengths of both replication and check pointing methods to achieve high recovery rates and efficient resource utilization. Replication supplies instant backup options, while checkpointing reduces recovery time and computational loss, and addresses the need for more adaptive and robust scheduling solutions in complex computing environments. This paper portrays a novel Priority-Based Task Scheduling and Fault Tolerant model (PBTS-FT) that computes priorities via response ratios, intending to optimize resource allocation and system responsiveness. We also accompany PBTS-FT with an integrated FT scheme that combines replication and checkpointing techniques, improving system flexibility to failures by considering both VM and task failures. The results were carried out in two setups that promise enhanced QoS (Quality of Service) including recovery, makespan, speedup, efficiency, resource utilization, and progress, thereby presenting a valuable contribution to the field. The same is proved in simulation experiments which showed recovery improvements of 60.75 %, 63.75 %, and 8.79 % compared to Replication, Checkpointing, and FTMSFIS methods respectively. Compared to Modified Min Min, HEFT, and PETS, the proposed PBTS-FT improves resource utilization by about 6.27 %, 12.89 %, and 14.20 % respectively. This provides empirical evidence that the proactive choice of FT methods substantially boosts the success rate of task completion and other parameters.