EPJ Web of Conferences (Jan 2021)

Addressing a billion-entries multi-petabyte distributed file system backup problem with cback: from files to objects

  • Valverde Cameselle Roberto,
  • Gonzalez Labrador Hugo

DOI
https://doi.org/10.1051/epjconf/202125102071
Journal volume & issue
Vol. 251
p. 02071

Abstract

Read online

CERNBox is the cloud collaboration hub at CERN. The service has more than 37,000 user accounts. The backup of user and project spaces data is critical for the service. The underlying storage system hosts over a billion files which amount to 12PB of storage distributed over thousands of disks with a tworeplica layout. Performing a backup operation over this vast amount of data and number of files is a non-trivial task. The original CERNBox backup system (an in-house event-driven file-level system) has been reconsidered and replaced by a new distributed and scalable backup infrastructure based on the open source tool RESTIC. The new system, codenamed cback, provides features needed in the HEP community to guarantee data safety and smooth operation from the system administrators. Daily snapshot-based backups of all our user and project areas along with automatic verification and restores are possible with this the new development. The backup data is also de-duplicated in blocks and stored as objects in a disk-based S3 cluster in another geographical location on the CERN campus, reducing storage costs and protecting critical data from major catastrophic events. We report on the design and operational experience of running the system and future improvement possibilities.