IEEE Access (Jan 2020)
A Thermal-Aware On-Line Fault Tolerance Method for TSV Lifetime Reliability in 3D-NoC Systems
Abstract
Through-Silicon-Via (TSV) based 3D Integrated Circuits (3D-IC) are one of the most advanced architectures by providing low power consumption, shorter wire length and smaller footprint. However, 3D-ICs confront lifetime reliability due to high operating temperature and interconnect reliability, especially the Through-Silicon-Via (TSV), which can significantly affect the accuracy of the applications. In this paper, we present an online method that supports the detection and correction of lifetime TSV failures, named IaSiG. By reusing the conventional recovery method and analyzing the output syndromes, IaSiG can determine and correct the defective TSVs. Results show that within a group, R redundant TSVs can fully localize and correct R defects and support the detection of R+1 defects. Moreover, by using G groups, it can localize up to GxR and detect up to G x (R + 1) defects. An implementation of IaSiG for 32-bit data in eight groups and two redundancies has a worst-case execution time (WCET) of 5,152 cycles while supporting at most 16 defective TSVs (50% localization). By integrating IaSiG onto a 3D Network-on-Chip, we also perform a grid-search based empirical method to insert suitable numbers of redundancies into TSV groups. The empirical method takes the operating temperature as the factor of accelerated fault due to the fact that temperature is one of the major issues of 3D-ICs. The results show that the proposed method can reduce the number of redundancies from the uniform method while still maintaining the required Mean Time to Failure.
Keywords