EPJ Web of Conferences (Jan 2020)
CMS data access and usage studies at PIC Tier-1 and CIEMAT Tier-2
Abstract
The current computing models from LHC experiments indicate that much larger resource increases would be required by the HL-LHC era (2026+) than those that technology evolution at a constant budget could bring. Since worldwide budget for computing is not expected to increase, many research activities have emerged to improve the performance of the LHC processing software applications, as well as to propose more efficient resource deployment scenarios and data management techniques, which might reduce this expected increase of resources. The massively increasing amounts of data to be processed leads to enormous challenges for HEP storage systems, networks and the data distribution to end-users. These challenges are particularly important in scenarios in which the LHC data would be distributed from small numbers of centers holding the experiment’s data. Enabling data locality relative to computing tasks via local caches on sites seems a very promising approach to hide transfer latencies while reducing the deployed storage space and number of replicas overall. However, this highly depends on the workflow I/O characteristics and available network across sites. A crucial assessment of how the experiments are accessing and using the storage services deployed in WLCG sites, to evaluate and simulate the benefits for several of the new emerging proposals within WLCG/HSF. Studies on access and usage of storage, data access and popularity studies for the CMS workflows executed in the Spanish Tier-1 (PIC) and Tier-2 (CIEMAT) sites supporting CMS activities are reviewed in this report, based on local and experiment monitoring data, spanning more than one year. This is of relevance for simulation of data caches for end-user analysis data, as well as identifying potential areas for storage savings.