EPJ Web of Conferences (Jan 2024)

Site Sonar-A Flexible and Extensible Infrastructure Monitoring Tool for ALICE Computing Grid

  • Wijethunga Kalana,
  • Storetvedt Maksim,
  • Grigoras Costin,
  • Betev Latchezar,
  • Litmaath Maarten,
  • Amarasinghe Gayashan,
  • Perera Indika

DOI
https://doi.org/10.1051/epjconf/202429504037
Journal volume & issue
Vol. 295
p. 04037

Abstract

Read online

The ALICE experiment at the CERN Large Hadron Collider relies on a massive, distributed Computing Grid for its data processing. The ALICE Computing Grid is built by combining a large number of individual computing sites distributed globally. These Grid sites are maintained by different institutions across the world and contribute thousands of worker nodes possessing different capabilities and configurations. Developing software for Grid operations that works on all nodes while harnessing the maximum capabilities offered by any given Grid site is challenging without advance knowledge of what capabilities each site offers. Site Sonar is an architecture-independent Grid infrastructure monitoring framework developed by the ALICE Grid team to monitor the infrastructure capabilities and configurations of worker nodes at sites across the ALICE Grid without the need to contact local site administrators. Site Sonar is a highly flexible and extensible framework that offers infrastructure metric collection without local agent installations at Grid sites. This paper introduces the Site Sonar Grid infrastructure monitoring framework and reports significant findings acquired about the ALICE Computing Grid using Site Sonar.

Keywords