EPJ Web of Conferences (Jan 2020)

ANaN — ANalyse And Navigate: Debugging Compute Clusters with Techniques from Functional Programming and Text Stream Processing

  • Adler Alexander,
  • Kebschull Udo

DOI
https://doi.org/10.1051/epjconf/202024501041
Journal volume & issue
Vol. 245
p. 01041

Abstract

Read online

Monitoring is an indispensable tool for the operation of any large installation of grid or cluster computing, be it high energy physics or elsewhere. Usually, monitoring is configured to collect a small amount of data, just enough to enable detection of abnormal conditions. Once detected, the abnormal condition is handled by gathering all information from the affected components. This data is processed by querying it in a manner similar to a database. This contribution shows how the metaphor of a debugger (for software applications) can be transferred to a compute cluster. The concepts of variables, assertions and breakpoints that are used in debugging can be applied to monitoring by defining variables as the quantities recorded by monitoring and breakpoints as invariants formulated via these variables. It is found that embedding fragments of a data extracting and reporting tool such as the UNIX tool awk facilitates concise notations for commonly used variables since tools like awk are designed to process large event streams (in textual representations) with bounded memory. A functional notation similar to both the pipe notation used in the UNIX shell and the point-free style used in functional programming simplify the combination of variables that commonly occur when formulating breakpoints.