Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki (Jun 2023)

Assessment of the readiness of a computer system for timely servicing of requests when combined with information recovery of memory after failures

  • Vladimir A. Bogatyrev,
  • Stanislav V. Bogatyrev,
  • Anatoly V. Bogatyrev

DOI
https://doi.org/10.17586/2226-1494-2023-23-3-608-617
Journal volume & issue
Vol. 23, no. 3
pp. 608 – 617

Abstract

Read online

The possibilities of increasing the readiness of a redundant computer system for the timely execution of requests critical to service delays are being investigated. A fault-tolerant computer cluster is considered in which nodes are duplicated computing systems that combine computer nodes and memory nodes. Two-stage recovery of memory nodes is assumed: first physical, and then informational, carried out using the resources of computing nodes. The novelty of the approach lies in the fact that for systems with a limitation of the allowable service time of functional requests, the impact of recovery disciplines on the readiness of the system with various options for dividing computing resources to restore information after memory failures and to perform the required functions is evaluated. At the same time, the reliability of the computer systems under study is assessed not only by the probability of their readiness to perform functional tasks (by the readiness coefficient), but also by the probability of the system readiness to perform tasks in a timely manner. Justification of the choice of disciplines for the restoration and maintenance of the flow of functional requests is carried out on the basis of Markov models. At the same time, models are proposed that allow taking into account the impact of the division of computing resources on the joint performance of the required functions and on the information recovery of memory, implemented after its physical recovery. The choice of computer system maintenance disciplines based on the proposed Markov model is aimed at achieving a compromise between the desire to increase the availability factor and the probability of timely execution of the incoming flow of functional requests. The justification of the choice of options for the distribution (separation) of computing resources stored after failures to solve functional queries (required functions) and information recovery of memory, implemented after its physical recovery, is carried out. Based on the proposed Markov models, the dependence of the system readiness for timely execution of requests on the distribution options of computing resources stored in the system for restoring information in memory and for performing functional tasks is investigated. The study was conducted depending on the allowable waiting time for functional requests and the intensity of their traffic. The influence on the system readiness for timely execution of traffic balancing requests of functional tasks between functional computing nodes is analyzed, taking into account the options for their possible joint use for information recovery of memory nodes after their physical recovery. The existence of an optimal share of traffic distribution between computing nodes is shown, taking into account the options for dividing their resources to service functional requests and to restore information in memory nodes after their physical recovery. The results obtained can be used to justify the choice of disciplines for servicing functional requests and recovery after failures of fault-tolerant cluster systems critical to delays in the execution of functional requests.

Keywords