EPJ Web of Conferences (Jan 2024)
Analysis Grand Challenge benchmarking tests on selected sites
Abstract
A fast turn-around time and ease of use are important factors for systems supporting the analysis of large HEP data samples. We study and compare multiple technical approaches. This article will be about setting up and benchmarking the Analysis Grand Challenge (AGC) [1] using CMS Open Data. The AGC is an effort to provide a realistic physics analysis with the intent of showcasing the functionality, scalability and feature-completeness of the Scikit-HEP Python ecosystem. We will present the results of setting up the necessary software environment for the AGC and benchmarking the analysis’ run time on various computing clusters: the institute SLURM cluster at LMU Munich, a SLURM cluster at LRZ (WLCG Tier-2 site) and the analysis facility Vispa [2], operated by RWTH Aachen. Each site provides slightly different software environments and modes of operation which poses interesting challenges on the flexibility of a setup like that intended for the AGC. Comparing these benchmarks to each other also provides insights about different storage and caching systems. At LRZ and LMU we have regular Grid storage (HDD) as well as an SSD-based XCache server and on Vispa a sophisticated per-node caching system is used.