Inverted Index Automata Frequent Itemset Mining for Large Dataset Frequent Itemset Mining

Xin Dai; Haza Nuzly Abdull Hamed; Qichen Su; Xue Hao

doi:10.1109/ACCESS.2024.3521285

IEEE Access (Jan 2024)

Inverted Index Automata Frequent Itemset Mining for Large Dataset Frequent Itemset Mining

Xin Dai,
Haza Nuzly Abdull Hamed,
Qichen Su,
Xue Hao

Affiliations

Xin Dai: ORCiD; Faculty of Computing, Universiti Teknologi Malaysia (UTM), Johor, Malaysia
Haza Nuzly Abdull Hamed: ORCiD; Faculty of Computing, Universiti Teknologi Malaysia (UTM), Johor, Malaysia
Qichen Su: ORCiD; Faculty of Computing, Universiti Teknologi Malaysia (UTM), Johor, Malaysia
Xue Hao: ORCiD; Faculty of Computing, Universiti Teknologi Malaysia (UTM), Johor, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2024.3521285
Journal volume & issue: Vol. 12
pp. 195111 – 195130

Abstract

Read online

Frequent itemset mining (FIM) faces significant challenges with the expansion of large-scale datasets. Traditional algorithms such as Apriori, FP-Growth, and Eclat suffer from poor scalability and low efficiency when applied to modern datasets characterized by high dimensionality and high-density features. These algorithms demand substantial memory resources and multiple database scans, which diminishes their practicality for rapid data processing. To address these challenges, this study proposes the Inverted Index Automata Frequent Itemset Mining (IA-FIM) algorithm. IA-FIM integrates the swift retrieval of an inverted index with the robust pattern recognition of finite automata, enabling efficient processing of extensive datasets. Distinct from conventional FIM algorithms, IA-FIM utilizes an inverted index automata to efficiently reduce the search space and memory footprint, eliminating repetitive database scans and multiple tree constructions. The proposed algorithm employs a single-pass scan strategy, constructing a dynamic and adjustable inverted index for a streamlined and compact representation of data. IA-FIM demonstrates superior performance in processing large sparse dataset, enhancing the processing speed of large dataset and fulfilling the demands of the big data era. At the same time, it improves the efficiency and practicality of FIM by reducing repeated scans and large memory dependencies, making it more feasible when processing large dataset.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords