IEEE Access (Jan 2021)
Fully Adaptive Stochastic Handling of Soft-Errors in Real-Time Systems
Abstract
In the design of real-time systems, it is becoming increasingly important to take soft-error tolerance into account. While hardening techniques such as error detection and error correction enable us to build systems that can better tolerate soft-errors, they are inevitably accompanied with execution time overhead. In order to mitigate the impact of the increased execution time, Chen et al. proposed to identify the $(m,k)$ -constraints of real-time tasks, which demand that at least $m$ jobs out of $k$ consecutive task invocations must be fault-free, and to design hardening policies based on this constraint. In this paper, we propose a new method to design hardening policies that are adaptive and stochastic. At the heart of our method is a new linear program (LP) formulation that finds an adaptive stochastic policy optimizing the CPU utilization. At the design time, we first identify the task set information of a given system, verify the system schedulability, and solve LPs to find an optimal policy. This policy is represented as a look-up table that specifies the stochastic hardening decisions as a function of the past execution history of the system. At the run time, hardening decisions are made simply by looking up this table. The proposed method finds hardening policies that adaptively reacts to the execution history of the system, allowing improvement in the CPU utilization. The method also deviates from the previous approaches’ viewpoint that reliability must be assessed in an all-or-nothing manner, by devising the notion of stochastic hardening policies. We evaluated the effectiveness of the proposed method using various task sets. In a set of 2,050 benchmarks, the system’s CPU utilization was improved by 2.80%-7.16% on average under different configurations. The improvement was by as high as 18.45% in the best benchmark.
Keywords