EPJ Web of Conferences (Jan 2024)

ROOT’s RNTuple I/O Subsystem: The Path to Production

  • Blomer Jakob,
  • Canal Philippe,
  • de Geus Florine,
  • Hahnfeld Jonas,
  • Naumann Axel,
  • Lopez-Gomez Javier,
  • Lazzari Miotto Giovanna,
  • Padulano Vincenzo Eduardo

DOI
https://doi.org/10.1051/epjconf/202429506020
Journal volume & issue
Vol. 295
p. 06020

Abstract

Read online

The RNTuple I/O subsystem is ROOT’s future event data file format and access API. It is driven by the expected data volume increase at upcoming HEP experiments, e.g. at the HL-LHC, and recent opportunities in the storage hardware and software landscape such as NVMe drives and distributed object stores. RNTuple is a redesign of the TTree binary format and API and has shown to deliver substantially faster data throughput and better data compression both compared to TTree and to industry standard formats. In order to let HENP computing workflows benefit from RNTuple’s superior performance, however, the I/O stack needs to connect efficiently to the rest of the ecosystem, from grid storage to (distributed) analysis frameworks to (multithreaded) experiment frameworks for reconstruction and ntuple derivation. With the RNTuple binary format soon arriving at its first production release, we present RNTuple’s feature set, integration efforts, and its performance impact on the time-to-solution. We show the latest performance figures of RDataFrame analysis code of realistic complexity, comparing RNTuple and TTree as data sources. We discuss RNTuple’s approach to functionality critical to the HENP I/O (such as multithreaded writes, fast data merging, schema evolution) and we provide an outlook on the road to its use in production.