EPJ Web of Conferences (Jan 2024)

EPN2EOS Data Transfer System

  • Şuiu Alice Florenţa,
  • Grigoraş Costin,
  • Weisz Sergiu,
  • Betev Latchezar

DOI
https://doi.org/10.1051/epjconf/202429501023
Journal volume & issue
Vol. 295
p. 01023

Abstract

Read online

ALICE is one of the four large experiments at the CERN LHC designed to study the structure and origins of matter in collisions of heavy ions and protons at ultra-relativistic energies. To collect, store, and process the experimental data, ALICE uses hundreds of thousands of CPU cores and more than 400 PB of different types of storage resources. During the LHC Run 3, started in 2022, ALICE is running with an upgraded detector and an entirely new data acquisition system (DAQ), capable of collecting 100 times more events than the previous setup. One of the key elements of the new DAQ is the Event Processing Nodes (EPN) farm, which currently comprises 250 servers, each equipped with 8 MI50 AMD GPU accelerators. The role of the EPN cluster is to compress the detector data in real-time. During heavy-ion data taking the experiment collects about 900 GB/s from the sensors, compressed down to 100 GB/s, and then written to a 130 PB persistent disk buffer for further processing. The EPNs handle data streams, called Time Frames, of 10 ms duration from the detector independently from each other and write the output, called Compressed Time Frames (CTF), to their local disk. The CTFs must be transferred to the disk buffer and removed from the EPNs as soon as possible, to be able to continue collecting data from the experiment. The data transfer functions are performed by the new EPN2EOS system that was introduced in the ALICE experiment in Run 3. EPN2EOS is highly optimized to perform the copy functions in parallel with the EPN data compression algorithms and has extensive monitoring and alerting capabilities to support the ALICE experiment operators. The service has been in production since November 2021. This paper presents the architecture, implementation, and analysis of its first years of utilization.