Symmetry (Oct 2021)
Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function
Abstract
Due to the quick increase in digital data, especially in mobile usage and social media, data deduplication has become a vital and cost-effective approach for removing redundant data segments, reducing the pressure imposed by enormous volumes of data that must be kept. As part of the data deduplication process, fingerprints are employed to represent and identify identical data blocks. However, when the amount of data increases, the number of fingerprints grows as well, and due to the restricted memory size, the speed of data deduplication suffers dramatically. Various deduplication solutions show a bottleneck in the form of matching lookups and chunk fingerprint calculations, for which we pay in the form of storage and processors needed for storing hashes. Utilizing a fast hash algorithm to improve the fingerprint lookup performance is an appealing challenge. Thus, this study is focused on enhancing the deduplication system by suggesting a novel and effective mathematical bounded linear hashing algorithm that decreases the hashing time by more than two times compared to MD5 and SHA-1 and reduces the size of the hash index table by 50%. Due to the enormous number of chunk hash values, looking up and comparing hash values takes longer for large datasets; this work offers a hierarchal fingerprint lookup strategy to minimize the hash judgement comparison time by up to 78%. Our suggested system reduces the high latency imposed by deduplication procedures, primarily the hashing and matching phases. The symmetry of our work is based on the balance between the proposed hashing algorithm performance and its reflection on the system efficiency, as well as evaluating the approximate symmetries of the hashing and lookup phases compared to the other deduplication systems.
Keywords