IEEE Access (Jan 2020)
Peak-Power-Aware Primary-Backup Technique for Efficient Fault-Tolerance in Multicore Embedded Systems
Abstract
Multicore platforms offer great potential for task-level redundancy to achieve a degree of fault-tolerance/reliability in embedded systems by exploiting the idle cores. However, due to the Thermal Design Power (TDP) constraint, it may not be possible to simultaneously power-on all cores in a multicore chip at the full-throttle (e.g., in ARM's big.LITTLE architecture). Since TDP is the maximum sustainable power that a chip can dissipate safely (as per the specifications given by a chip vendor), violating TDP triggers a performance throttling mechanism (e.g., by lowering the operating voltage and frequency, or by power-gating with task migration) to avoid possible overheating problems. This can significantly affect the timeliness of the system, and hence, represents a serious challenge in using (off-the-shelf) multicore platforms in real-time embedded systems when exploiting it for full-scale reliability. That means only a few tasks can be afforded to run in a fully reliable mode under a given TDP constraint. In this article, at first, we study the power consumption of task-level redundancy running on multicore platforms. Then, to tackle the peak power problem, we propose a novel primary-backup scheme for power-aware scheduling of real-time tasks on core pairs in multicore systems. The proposed scheme aims at removing overlaps of peak power of concurrently executing tasks to keep the power consumption below the chip-level TDP constraint. This would facilitate higher reliability levels within a given power budget. To do this, considering the tasks' power profiles, we propose a task partitioning method along with maximum-peak-power-first (MPPF) and maximum-peak-power-last (MPPL) policies to schedule original and redundant copies of tasks, respectively. Our experiments show that our technique provides up to 50% (on average by 29.5%) peak power reduction compared to state-of-the-art schemes, while providing the same reliability level.
Keywords