Job CPU Performance comparison based on MINIAOD reading options: Local versus remote

Balcas Justas; Newman Harvey; Bhat Preeti P.; Uppalapati Sravya; Moya Andres; Iordache Catalin; Sirvinskas Raimondas

doi:10.1051/epjconf/202429504028

EPJ Web of Conferences (Jan 2024)

Job CPU Performance comparison based on MINIAOD reading options: Local versus remote

Balcas Justas,
Newman Harvey,
Bhat Preeti P.,
Uppalapati Sravya,
Moya Andres,
Iordache Catalin,
Sirvinskas Raimondas

Affiliations

Balcas Justas: George W. Downs Laboratory of Physics and Charles C. Lauritsen Laboratory of High Energy Physics 1200 E California Blvd Pasadena
Newman Harvey: George W. Downs Laboratory of Physics and Charles C. Lauritsen Laboratory of High Energy Physics 1200 E California Blvd Pasadena
Bhat Preeti P.: George W. Downs Laboratory of Physics and Charles C. Lauritsen Laboratory of High Energy Physics 1200 E California Blvd Pasadena
Uppalapati Sravya: George W. Downs Laboratory of Physics and Charles C. Lauritsen Laboratory of High Energy Physics 1200 E California Blvd Pasadena
Moya Andres: George W. Downs Laboratory of Physics and Charles C. Lauritsen Laboratory of High Energy Physics 1200 E California Blvd Pasadena
Iordache Catalin: George W. Downs Laboratory of Physics and Charles C. Lauritsen Laboratory of High Energy Physics 1200 E California Blvd Pasadena
Sirvinskas Raimondas: George W. Downs Laboratory of Physics and Charles C. Lauritsen Laboratory of High Energy Physics 1200 E California Blvd Pasadena

DOI: https://doi.org/10.1051/epjconf/202429504028
Journal volume & issue: Vol. 295
p. 04028

Abstract

Read online

A critical challenge of performing data transfers or remote reads is to be as fast and efficient as possible while, at the same time, keeping the usage of system resources as low as possible. Ideally, the software that manages these data transfers should be able to organize them so that one can have them run up to the hardware limits. Significant portions of LHC analysis use the same datasets, running over each file or dataset multiple times. By utilizing "ondemand" based regional caches, we can improve CPU Efficiency and reduce the wide area network usage. Speeding up user analysis and reducing network usage (and hiding latency from jobs by caching most essential files on demand) are significant challenges for HL-LHC, where the data volume increases to an exabyte level. In this paper, we will describe our journey and tests with the CMS XCache project (SoCal Cache), which will compare job performance and CPU efficiency using different storage solutions (Hadoop, Ceph, Local Disk, Named Data Networking). It will also provide insights into our tests over a wide area network and possible storage and network usage savings.

Published in EPJ Web of Conferences

ISSN: 2100-014X (Online)
Publisher: EDP Sciences
Country of publisher: France
LCC subjects: Science: Physics
Website: http://www.epj-conferences.org/

About the journal