EPJ Web of Conferences (Jan 2024)

CMSSW Scaling Limits on Many-Core Machines

  • Jones Christopher,
  • Gartung Patrick

DOI
https://doi.org/10.1051/epjconf/202429503008
Journal volume & issue
Vol. 295
p. 03008

Abstract

Read online

Today the LHC offline computing relies heavily on CPU resources, despite the interest in compute accelerators, such as GPUs, for the longer term future. The number of cores per CPU socket has continued to increase steadily, reaching the levels of 64 cores (128 threads) with recent AMD EPYC processors, and 128 cores on Ampere Altra Max ARM processors. Over the course of the past decade, the CMS data processing framework, CMSSW, has been transformed from a single-threaded framework into a highly concurrent one. The first multithreaded version was brought into production by the start of the LHC Run 2 in 2015. Since then, the framework’s threading efficiency has gradually been improved by adding more levels of concurrency and reducing the amount of serial code paths. The latest addition was support for concurrent Runs. In this work we review the concurrency model of the CMSSW, and measure its scalability with real CMS applications, such as simulation and reconstruction, on modern many-core machines. We show metrics such as event processing throughput and application memory usage with and without the contribution of I/O, as I/O has been the major scaling limitation for the CMS applications.