Probabilistic Support Prediction: Fast Frequent Itemset Mining in Dense Data

Muhammad Sadeequllah; Azhar Rauf; Saif Ur Rehman; Noha Alnazzawi

doi:10.1109/ACCESS.2024.3376477

IEEE Access (Jan 2024)

Probabilistic Support Prediction: Fast Frequent Itemset Mining in Dense Data

Muhammad Sadeequllah,
Azhar Rauf,
Saif Ur Rehman,
Noha Alnazzawi

Affiliations

Muhammad Sadeequllah: ORCiD; Department of Computer Science, University of Peshawar, Peshawar, Pakistan
Azhar Rauf: Department of Computer Science, University of Peshawar, Peshawar, Pakistan
Saif Ur Rehman: Department of Computer Science, University of Peshawar, Peshawar, Pakistan
Noha Alnazzawi: Computer Science and Engineering Department, Yanbu Industrial College, Yanbu, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2024.3376477
Journal volume & issue: Vol. 12
pp. 39330 – 39350

Abstract

Read online

Frequent itemset mining (FIM) is a highly resource-demanding data-mining task fundamental to numerous data-mining applications. Support calculation is a frequently performed computation-intensive operation of FIM algorithms, whereas storing transactional data is memory-intensive. The FIM is even more resource-hungry for dense data than for sparse data. The rapidly growing size of datasets further exacerbates this situation and necessitates the design of out-of-the-box highly efficient solutions. This paper proposes a novel approach to frequent itemset mining for dense datasets. This approach, after the initial stage, does not use transactional data, which makes it memory efficient. It also replaces processing-intensive support calculations with efficient support predictions, which are probabilistic and need no transactional data. To predict the support of an itemset, it only needs the support of its subsets. However, this technique works only for itemsets of size three or higher. We also propose an FIM algorithm ProbBF, which incorporates this technique. The ProbBF discards transactional data after it uses it to calculate frequent one and two-size itemsets. For the itemsets of size k, where $k\ge 3$ , ProbBF uses the proposed probabilistic technique to predict their support. It is considered frequent if the predicted support is greater than a given threshold. Our experiments show that ProbBF is efficient in both time and space against state-of-the-art FIM algorithms that use transactional data. The experiments also show that ProbBF can successfully generate the majority of the frequent itemsets on real-world datasets. Since ProbBF is probabilistic, some loss in quality is inevitable.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords