IEEE Access (Jan 2018)

TUB-HAUPM: Tighter Upper Bound for Mining High Average-Utility Patterns

  • Jimmy Ming-Tai Wu,
  • Jerry Chun-Wei Lin,
  • Matin Pirouz,
  • Philippe Fournier-Viger

DOI
https://doi.org/10.1109/ACCESS.2018.2820740
Journal volume & issue
Vol. 6
pp. 18655 – 18669

Abstract

Read online

High-utility itemset mining (HUIM) has been gaining popularity in the field of data mining. Frequent itemset mining used to be the main tool to reveal high-frequency patterns but failed to consider the concept of profit. HUIM, on the other hand, obtains the itemsets and is practical in commercial applications. A main challenge in HUIM is that HUIM should handle the exponential search space for HUIM when the number of distinct items and the size of the database are both too large. The other challenge is that existing HUIM methods overlook the length of high-utility itemsets; hence, a large itemset gets an unreasonable estimated profit as opposed to the actual value. Therefore, several algorithms were proposed to mine high average-utility itemsets. High average-utility itemset mining (HAUIM) is an extension for the traditional HUIM, which provides a different measure with HUIM. It discovers utility patterns by considering both their utilities and lengths. To reduce the searching space in HAUIM, average-utility upper-bound, looser upper-bound utility, and a revised tighter upper-bound model are proposed to prune the searching graph in HAUIM. These three upper-bounds for high average-utility itemsets decrease the number of candidate patterns efficiently. However, they still overestimate a high average-utility itemset and waste on assessing the unnecessary patterns. Two new tighter upper-bounds, maximum following utility upper-bound and top-k transaction-maximum utility upper-bound, are proposed in this paper to further contract the size of candidate pattern set. Experiments conducted on several benchmark data sets show that the proposed method outperforms the previous HAUIM algorithms in terms of runtime, the number of join operations, and scalability.

Keywords