IEEE Access (Jan 2023)

BPR: An Erasure Coding Batch Parallel Repair Approach in Distributed Storage Systems

  • Ying Song,
  • Wenxuan Zhao,
  • Bo Wang

DOI
https://doi.org/10.1109/ACCESS.2023.3257404
Journal volume & issue
Vol. 11
pp. 44509 – 44518

Abstract

Read online

Today, Erasure Coding is one of the most significant techniques widely used in distributed systems because it can improve reliability for large amounts of data with low storage overhead. However, when the distributed system encounters a large number of data loss in stripes and requires batch-stripes data recovery, current data recovery methods either repeat the single-stripe recovery method or only optimize partial stripe recovery when recovering large-scale stripes, which incurs heavy upload and download repair traffics and imbalanced load, affecting the efficiency of fault recovery and wasting additional resources. In this paper, we propose BPR, an Erasure Coding batch parallel repair approach for distributed storage systems. BPR reduces cross-rack network transfer time and increases recovery throughput by classifying the stripes and recovering the data of stripes in batches through the forward and reverse parallel data recovery. The experiment results show that for large-scale stripes recovery, BPR reduces the cross-rack network transfer time by up to 10% and increases the recovery throughput by up to 8% compared with the rPDL in some scenarios.

Keywords