Jisuanji kexue yu tansuo (Jan 2022)
Hybrid Parallel Frequent Itemsets Mining Algorithm by Using N-List Structure
Abstract
Aiming at the problem of unbalanced load, bad efficiency of N-list merge and redundant search for each node based on parallel frequent itemset mining algorithm MRPrePost (parallel PrePost algorithm based on MapReduce), this paper proposes a hybrid parallel frequent itemset mining algorithm based on N-list (HP-FIMBN). Firstly, a load estimation function (LE) is designed to calculate the load of each item in F-list, and a grouping method based on greedy strategy (GM-GS) is proposed, which not only deals with the problem of unbalanced load in the process of data partitioning, but also decreases the size of sub-PPC-tree on local node. Secondly, in order to accelerate the completion of the merging of two N-list structures, it designs an early abandon strategy (EAS), which can not only efficiently avoid invalid calculations in the merging process, but also does not need to traverse the initial N-list structure to get the final result. Finally, the set-enumeration tree is adopted as the search space, a superset equivalent strategy (SES) is proposed to avoid redundant search in the mining process and the final mining results are generated. Experimental results show that the modified algorithm has better performance on mining frequent itemsets in big data environment.
Keywords