An algebraic semigroup method for discovering maximal frequent itemsets

Liu Jiang; Li Jing; Ni Feng; Xia Xiang; Li Shunlong; Dong Wenhui

doi:10.1515/math-2022-0516

Open Mathematics (Nov 2022)

An algebraic semigroup method for discovering maximal frequent itemsets

Liu Jiang,
Li Jing,
Ni Feng,
Xia Xiang,
Li Shunlong,
Dong Wenhui

Affiliations

Liu Jiang: Department of Systems Science, University of Shanghai for Science and Technology, Shanghai 200093, China
Li Jing: Department of Systems Science, University of Shanghai for Science and Technology, Shanghai 200093, China
Ni Feng: Department of Systems Science, University of Shanghai for Science and Technology, Shanghai 200093, China
Xia Xiang: Department of Systems Science, University of Shanghai for Science and Technology, Shanghai 200093, China
Li Shunlong: Department of Systems Science, University of Shanghai for Science and Technology, Shanghai 200093, China
Dong Wenhui: Department of Systems Science, University of Shanghai for Science and Technology, Shanghai 200093, China

DOI: https://doi.org/10.1515/math-2022-0516
Journal volume & issue: Vol. 20, no. 1
pp. 1432 – 1443

Abstract

Read online

Discovering maximal frequent itemsets is an important issue and key technique in many data mining problems such as association rule mining. In the literature, generating maximal frequent itemsets proves either to be NP-hard or to have O(l34l(m+n))O\left({l}^{3}{4}^{l}\left(m+n)) complexity in the worst case from the perspective of generating maximal complete bipartite graphs of a bipartite graph, where mm, nn are the item number and the transaction number, respectively, and ll denotes the maximum of ∣C∣∣Ψ(C)∣/(∣C∣+∣Ψ(C)∣−1)| C| | \Psi \left(C)| \hspace{0.1em}\text{/}\hspace{0.1em}\left(| C| +| \Psi \left(C)| -1), with the maximum taken over all maximal frequent itemsets CC. In this article, we put forward a method for discovering maximal frequent itemsets, whose complexity is O(3mn2β+4βn)O\left(3mn{2}^{\beta }+{4}^{\beta }n), lower than the known complexity both in the worst case, from the perspective of semigroup algebra, where β\beta is the number of items whose support is more than the minimum support threshold. Experiments also show that an algorithm based on the algebraic method performs better than the other three well-known algorithms. Meanwhile, we explore some algebraic properties with respect to items and transactions, prove that the maximal frequent itemsets are exactly the simplified generators of frequent itemsets, give a necessary and sufficient condition for a maximal i+1i+1-frequent itemset being a subset of a closed ii-frequent itemset, and provide a recurrence formula of maximal frequent itemsets.

Published in Open Mathematics

ISSN: 2391-5455 (Online)
Publisher: De Gruyter
Country of publisher: Poland
LCC subjects: Science: Mathematics
Website: https://www.degruyter.com/journal/key/math/html

About the journal

Abstract

Keywords