IEEE Access (Jan 2020)
An Optimal Recovery Approach for Liberation Codes in Distributed Storage Systems
Abstract
To reduce the storage cost, distributed storage systems are gradually using erasure codes to ensure data reliability. Liberation codes, which satisfy the maximum distance separable (MDS) property and provide optimal modification overhead, are a class of popular two fault tolerant erasure codes. However, erasure codes need to read from surviving nodes and transfer across the network large amounts of data when recovering from single node failures. Existing single node failure recovery approaches for Liberation codes are either time-consuming or suboptimal. In this article, firstly, we prove the minimum number of symbols required to recover one failed node for a Liberation coded system. Then we derive the conditions that optimal recovery solutions need to satisfy. Finally, we propose an algorithm, called Disk Read Optimal Recovery (DROR), which can determine an optimal recovery solution in linear time and recover the failed node reading the minimum amount of data. We have implemented DROR in a real-world storage system Ceph and evaluated DROR on a cluster of Amazon EC2 instances. We show that DROR reduces the reconstruction time by up to 23.6% compared to that of the recovery approach in Ceph.
Keywords