Nihon Kikai Gakkai ronbunshu (Jan 2024)

Development of accelerated test method for developing failure prediction technology for SoC-equipped boards and application to failure prediction

  • Zhi WANG,
  • Hiroshi SHINTANI,
  • Rintaro MINAMITANI,
  • Takanori SANJYOU

DOI
https://doi.org/10.1299/transjsme.23-00108
Journal volume & issue
Vol. 90, no. 929
pp. 23-00108 – 23-00108

Abstract

Read online

In recent years, attention has been focused on failure prediction technology for electronic devices that can ensure the stable operation of the infrastructure, by warning the failure of devices before it happens, and noticing for an early maintenance which can minimize downtime. Different from the HDD, which is a storage medium, the research about the SoC or the CPU, which is an arithmetic unit, is few. One of the reasons is that the SoC or the CPU has a long life and a low failure rate, because of their smaller number of mechanical parts. Also, the number of SoCs or CPUs used in infrastructure control boards is small. Therefore, the problem is that it is difficult to obtain log data and failure data for the SoC or the CPU. Thus, the purpose of this research is to establish an accelerated test method that causes SoC to failure in a short time, with similar failure mode for those works in normal condition, and to acquire log data and failure data for the further development of failure prediction technology. By establishing a system that can locally heat the SoC to promote failure, it is possible to acquire a large amount of SoC failure data and log data in a short time. Finally, using the acquired 17 sets of failure data and log data, we tried to apply them to the failure prediction of the control board SoC. By using machine leaning algorithm OC-SVM, and using CPU calculation time as feature, it was showed that the failure of the SoC can be detected within 5% of the remaining life, with true detection rate of 47.1% and false detection rate of 17.6%. Based on this result, the failure prediction technology developed in this research was considered having a certain level of value in field using, by strengthening the redundancy system.

Keywords