Vietnam Journal of Computer Science (May 2017)

Efficiently mining association rules based on maximum single constraints

  • Anh Tran,
  • Tin Truong,
  • Bac Le

DOI
https://doi.org/10.1007/s40595-017-0096-2
Journal volume & issue
Vol. 4, no. 4
pp. 261 – 277

Abstract

Read online

Abstract A serious problem encountered during the mining of association rules is the exponential growth of their cardinality. Unfortunately, the known algorithms for mining association rules typically generate scores of redundant and duplicate rules. Thus, we not only waste CPU but also encounter difficulties saving, managing and using the results of these algorithms. The present paper focuses on the discovery of association rules in which the left-handed and right-handed sides contain in two user-supplied maximum single constraints. If the constraints appear on or differ from a lattice of closed itemsets (together with their typically undersized generators and supports) that have been mined and saved once, we quickly extract the corresponding frequent sub one. Using an equivalence relation based on the closure of the two rule sides, the association rule set with maximum single constraints is partitioned into disjoint equivalence classes. Without loss of generality, it is necessary to consider mining each class independently. This helps avoid the wasteful generation of numerous candidates, reduces the burden of storing the support and confidence of rules in the same class and establishes a foundation for mining algorithms in parallel and distributed environments. In each class, the rules are represented as unique and explicit via the corresponding closed itemsets and generators. Due to the low cardinality and size of the generators, mining based on these representations, which does not generate duplicates, is very efficient. In the present paper, all these theoretical results are proven mathematically and used to construct the $$MAR\_MaxSC$$ M A R _ M a x S C algorithm. The efficiency of $$MAR\_MaxSC$$ M A R _ M a x S C compared with post-processing methods for mining association rules with maximum single constraints is then verified on several characteristic databases.

Keywords