Informatika (Jul 2021)

The study of the reliability of the hardware part of the office cluster

  • T. S. Martinovich,
  • N. N. Paramonov,
  • A. G. Rymarchuk,
  • O. P. Tchij

DOI
https://doi.org/10.37661/1816-0301-2021-18-2-48-57
Journal volume & issue
Vol. 18, no. 2
pp. 48 – 57

Abstract

Read online

The study of measures of reliability of the hardware part of the office cluster was carried out on the example of the cluster SKIF-GEO-Office RB (further as “cluster”) developed within the framework of scientific and technical program "SKIF-NEDRA" (2015-2018, Program of the Union State of Russia and Belarus). The cluster components are located in a small rack on the basis of full Tower "Aerocool Expredator Black" type case.The basic architectural principles implemented in the cluster, the composition, structural and functional scheme of the cluster are given. The methodological support for calculating the reliability of the cluster, based on previous studies of the authors, and its structural scheme of reliability is justified. The choice of the main measures of reliability of the cluster core and the set of computing facilities is justified and formulas of calculation of these measures are given. The analysis of the consequences of failures of component parts of the cluster is carried out.A mathematical model of reliability (state graph) of the set of computing facilities of cluster is proposed, which allows to derive formulas for calculating the average value of the time-to-failure and time-to-interruption of cluster. The estimation of the reliability of the cluster as a whole, based on the calculation of measures of reliability on the reference data on the reliability of components as well as on the operation of supercomputers of the family SKIF. The measures of reliability of the cluster are calculated.

Keywords