Applied Sciences (Jul 2020)

An Efficient Two-Level-Partitioning-Based Double Array and Its Parallelization

  • Lianyin Jia,
  • Chongde Zhang,
  • Mengjuan Li,
  • Yinong Chen,
  • Yong Liu,
  • Jiaman Ding

DOI
https://doi.org/10.3390/app10155266
Journal volume & issue
Vol. 10, no. 15
p. 5266

Abstract

Read online

Trie is one of the most common data structures for string storage and retrieval. As a fast and efficient implementation of trie, double array (DA) can effectively compress strings to reduce storage spaces. However, this method suffers from the problem of low index construction efficiency. To address this problem, we design a two-level partition (TLP) framework in this paper. We first divide the dataset is into smaller lower-level partitions, and then we merge these partitions into bigger upper-level partitions using a min-heap based greedy merging algorithm (MH-GMerge). TLP has an excellent characteristic of load balancing and can be easily parallelized. We implemented two efficient parallel partitioned DAs based on TLP. Extensive experiments were carried out, and the results showed that the proposed methods can significantly improve the construction efficiency of DA and can achieve a better trade-off between construction and retrieval performance than the existing state-of-the-art methods.

Keywords