EPJ Web of Conferences (Jan 2019)

Scaling the EOS namespace – new developments, and performance optimizations

  • Bitzes Georgios,
  • Sindrilaru Elvin Alin,
  • Joachim Peters Andreas

DOI
https://doi.org/10.1051/epjconf/201921404019
Journal volume & issue
Vol. 214
p. 04019

Abstract

Read online

EOS is the distributed storage solution being developed and deployed at CERN with the primary goal of fulfilling the data needs of the LHC and its various experiments. Being in production since 2011, EOS currently manages around 256 petabytes of raw disk space and 3.4 billion files across several instances. Nowadays, EOS is increasingly being used as a distributed filesystem and file sharing platform, which poses scalability challenges on its legacy namespace subsystem, tasked with keeping track of all file and directory metadata on a particular instance. In this paper we discuss said challenges, and present our solution which has recently entered production. We made several architectural improvements to the overall system design, the most important of which was introducing QuarkDB, a highly-available datastore capable of serving as the metadata backend for EOS, tailored to the needs of the namespace. We also describe our efforts in providing comparable latency and performance to the legacy in-memory implementation, both when reading through the use of extensive caching and prefetching, and when writing through the use of latencyhiding techniques involving a persistent, back-pressured local queue for batching writes towards the QuarkDB backend.