IEEE Access (Jan 2024)
Dynamic Triple Modular Redundancy in Interleaved Hardware Threads: An Alternative Solution to Lockstep Multi-Cores for Fault-Tolerant Systems
Abstract
Over the years, significant work has been done on high-integrity systems, such as those found in cars, satellites and aircrafts, to minimize the risk that a logic fault causes a system failure, thus having functional safety as a key requirement. In this study, we employ an innovative approach to harness the benefits of both Dual Modular Redundancy and Triple Modular Redundancy techniques within an Interleaved-Multi-Threading microprocessor core, by means of a microarchitecture design capable of dynamically switching from Dual Modular Redundancy to Triple Modular Redundancy in case of faults. We explain the quantitative results obtained from an extensive fault injection simulation campaign on the fault tolerant core compared with its previous version regarding fault tolerant capabilities. The results show that in several application cases the fault resilience improvement and the hardware and timing overhead are better compared to the lockstep-based dual core approach. The proposed technique achieves 98,6% fault mitigation at the expense of only 4 clock cycles for roll-back overhead, with no checkpointing redundancy.
Keywords