IEEE Access (Jan 2022)
On-Die Dynamic Remapping Cache: Strong and Independent Protection Against Intermittent Faults
Abstract
As process scaling continues, DRAM is getting more vulnerable to errors. System companies and DRAM vendors have introduced ECC to protect DRAM against growing errors. ECC, however, should be combined with a repair mechanism to prevent non-transient faults from repeatedly producing errors and to prevent overlapping faults from accumulating into more severe errors. We propose a novel approach to repair memory and improve reliability with minimal overheads. On-die Dynamic Remapping Cache (DRC) minimizes the repair overheads by focusing on active faults. Most intermittent faults (e.g., Variable Retention Time) generate errors occasionally, and the number of active faults at any one time is significantly lower than the total. DRC tracks the activity and severity of faults at run-time and uses a small cache inside a DRAM to remap active faults. This efficiency enables aggressive remapping of bit faults, which eliminates fault accumulations and improves reliability. Our evaluation shows DRC can provide much stronger protection than the state-of-the-art protection schemes with no performance degradation and a negligible chip area overhead.
Keywords