IEEE Access (Jan 2021)

Research on Routing Strategy in Cluster Deduplication System

  • Qinlu He,
  • Genqing Bian,
  • Weiqi Zhang,
  • Fan Zhang,
  • Shengqiang Duan,
  • Fenglang Wu

DOI
https://doi.org/10.1109/ACCESS.2021.3116270
Journal volume & issue
Vol. 9
pp. 135485 – 135495

Abstract

Read online

A cluster deduplication system can coordinate the work of multiple nodes, which can better alleviate the disk index bottleneck existing in the large-scale data backup system. However, there is a problem of isolated islands of information among nodes during data deduplication. When the servers use the query mode to route data, a large amount of system overhead is required to ensure a high deduplication rate and low throughput rate. At the same time, while the servers cannot obtain a higher deduplication rate if the servers adopt the stateless routing method. Data routing strategy can greatly affect the overall performance of the system. The concept of data frequency is proposed in this paper, and the classified routing strategy is designed. In the metadata server, a byte-shaped Bloom filter for recording the occurrence frequency of data blocks is maintained to record the occurrence frequency of data blocks. The values in the Bloom filter are counted. Then the frequency of the data blocks is compared with the configured threshold value to determine whether the data is “hot data”. We use stateful routing to send “clod data” to the storage nodes and use stateless routing to send the hot data to the storage nodes. Experimental results show that the classifying routing algorithm based on the frequency of data can greatly reduce the overhead of the system while guaranteeing the deduplication rate of the deduplication system as well as improve system throughput and real-time processing capabilities. Compared with the fully stateful routing scheme, our method only loses less than 2% of the deduplication rate, which reduces the communication query overhead by more than 25% and improves the real-time processing capability of the storage system.

Keywords