EPJ Web of Conferences (Jan 2019)

Advanced Analytics service to enhance workflow control at the ATLAS Production System

  • Titov Mikhail,
  • Borodin Mikhail,
  • Golubkov Dmitry,
  • Klimentov Alexei

DOI
https://doi.org/10.1051/epjconf/201921403007
Journal volume & issue
Vol. 214
p. 03007

Abstract

Read online

Modern workload management systems that are responsible for central data production and processing in High Energy and Nuclear Physics experiments have highly complicated architectures and require a specialized control service for resource and processing components balancing. Such a service represents a comprehensive set of analytical tools, management utilities and monitoring views aimed at providing a deep understanding of internal processes, and is considered as an extension for situational awareness analytic service. Its key points are analysis of task processing, e.g., selection and regulation of key task features that affect its processing the most; modeling of processed data life-cycles for further analysis, e.g., generate guidelines for particular stage of data processing; and forecasting processes with focus on data and tasks states as well as on the management system itself, e.g., to detect the source of any potential malfunction. The prototype of the advanced analytics service will be an essential part of the analytical service of the ATLAS Production System (ProdSys2). Advanced analytics service uses such tools as Time-To-Complete (TTC) estimation towards units of the processing (i.e., tasks and chains of tasks) to control the processing state and to be able to highlight abnormal operations and executions. Obtained metrics are used in decision making processes to regulate the system behaviour and resources consumption.