EPJ Web of Conferences (Jan 2024)

Darshan for HEP applications

  • Wang Rui,
  • Snyder Shane,
  • Benjamin Douglas,
  • Dong Zhihua,
  • Gartung Patrick,
  • Herner Kenneth

DOI
https://doi.org/10.1051/epjconf/202429505003
Journal volume & issue
Vol. 295
p. 05003

Abstract

Read online

Modern HEP workflows must manage increasingly large and complex data collections. HPC facilities may be employed to help meet these workflows’ growing data processing needs. However, a better understanding of the I/O patterns and underlying bottlenecks of these workflows is necessary to meet the performance expectations of HPC systems. Darshan is a lightweight I/O characterization tool that captures concise views of HPC application I/O behavior. It intercepts application I/O calls at runtime, records file access statistics for each process, and generates log files detailing application I/O access patterns. Typical HEP workflows include event generation, detector simulation, event reconstruction, and subsequent analysis stages. A study of the I/O behavior of the ATLAS simulation and filtering stage, and the CMS simulation workflow using Darshan is presented, including insights into the I/O operations and data access size.