EPJ Web of Conferences (Jan 2024)

Utilizing Distributed Heterogeneous Computing with PanDA in ATLAS

  • Maeno Tadashi,
  • Alekseev Aleksandr,
  • Barreiro Megino Fernando Harald,
  • De Kaushik,
  • Guan Wen,
  • Karavakis Edward,
  • Klimentov Alexei,
  • Korchuganova Tatiana,
  • Lin FaHui,
  • Nilsson Paul,
  • Wenaus Torre,
  • Yang Zhaoyu,
  • Zhao Xin

DOI
https://doi.org/10.1051/epjconf/202429504053
Journal volume & issue
Vol. 295
p. 04053

Abstract

Read online

In recent years, advanced and complex analysis workflows have gained increasing importance in the ATLAS experiment at CERN, one of the large scientific experiments at LHC. Support for such workflows has allowed users to exploit remote computing resources and service providers distributed worldwide, overcoming limitations on local resources and services. The spectrum of computing options keeps increasing across the Worldwide LHC Computing Grid (WLCG), volunteer computing, high-performance computing, commercial clouds, and emerging service levels like Platform-as-a-Service (PaaS), Container-as-a-Service (CaaS) and Function-as-a-Service (FaaS), each one providing new advantages and constraints. Users can significantly benefit from these providers, but at the same time, it is cumbersome to deal with multiple providers, even in a single analysis workflow with fine-grained requirements coming from their applications’ nature and characteristics. In this paper, we will first highlight issues in geographically-distributed heterogeneous computing, such as the insulation of users from the complexities of dealing with remote providers, smart workload routing, complex resource provisioning, seamless execution of advanced workflows, workflow description, pseudointeractive analysis, and integration of PaaS, CaaS, and FaaS providers. We will also outline solutions developed in ATLAS with the Production and Distributed Analysis (PanDA) system and future challenges for LHC Run4.