Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki (May 2015)
ANALYSIS AND ESTIMATION OF THE TRIE MINIMUM LEVEL IN NON-HASH DEDUPLICATION SYSTEM
Abstract
Subject of research. The paper deals with a method of restriction for the trie minimum level in non-hash data deduplication system. Method. The subject matter of the method lies in forcibly completing the trie to a specific minimum level. The proposed method makes it possible to increase performance of the process by reducing the number of collisions at the lower levels ofthe trie. The maximum theoretical performance growth corresponds to the share of collisions in the total number of data read operations from the storage medium. Proposed method application increases the metadata size to the amount of new structures containing one element. Main results. The results of the work have been proved by the data of computational experiment with non-has deduplication on 528 GB data set. The process analysis has shown that 99% of the execution time is taken to head positioning of hard-drives. The reason is a random distribution of the blocks on the storage medium. Application of the method of minimum level restriction for the trie in non-hash data deduplication system on the experimental data set gives the possibility to increase performance maximum by 16% and the increase of metadata size is 49%. The total amount of metadata is 34% less than with hash-based deduplication using the MD5 algorithm, and is 17% less than using Tiger192 algorithm. These results confirm the effectiveness of the proposed method. Practical relevance. The proposed method increases the performance of deduplication process by reducing the number of collisions in the trie construction. The results are of practical importance for professionals involved in the development of non-hash data deduplication methods.
Keywords