Incremental Top-k High Utility Pattern Mining and Analyzing Over the Entire Accumulated Dynamic Database

Chanhee Lee; Hanju Kim; Myungha Cho; Hyeonmo Kim; Bay Vo; Jerry Chun-Wei Lin; Philippe Fournier-Viger; Unil Yun

doi:10.1109/ACCESS.2024.3406562

IEEE Access (Jan 2024)

Incremental Top-k High Utility Pattern Mining and Analyzing Over the Entire Accumulated Dynamic Database

Chanhee Lee,
Hanju Kim,
Myungha Cho,
Hyeonmo Kim,
Bay Vo,
Jerry Chun-Wei Lin,
Philippe Fournier-Viger,
Unil Yun

Affiliations

Chanhee Lee: Department of Computer Engineering, Sejong University, Seoul, South Korea
Hanju Kim: Department of Computer Engineering, Sejong University, Seoul, South Korea
Myungha Cho: ORCiD; Department of Computer Engineering, Sejong University, Seoul, South Korea
Hyeonmo Kim: Department of Computer Engineering, Sejong University, Seoul, South Korea
Bay Vo: ORCiD; HUTECH University, Ho Chi Minh City, Vietnam
Jerry Chun-Wei Lin: ORCiD; Department of Distributed Systems and IT Devices, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
Philippe Fournier-Viger: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
Unil Yun: ORCiD; Department of Computer Engineering, Sejong University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3406562
Journal volume & issue: Vol. 12
pp. 77605 – 77620

Abstract

Read online

Top-k high utility pattern mining, which extracts the highest top-k patterns that the users want to find, has been actively studied. Most previous studies in this domain have focused on static databases, where data insertions do not occur. In the real world, however, various applications continuously generate new data, and existing top-k high utility pattern mining algorithms devised to process static databases cannot handle incremental databases. Although some methods can handle stream data, they have the limitation of processing a portion of the database rather than the entire accumulated database. In this paper, we suggest an efficient incremental mining method that discovers top-k high utility patterns from the entire accumulated database. The proposed approach utilizes a list structure that stores minimal utility information required for the mining process and does not generate candidate itemsets. The suggested algorithm processes the incremental data with a single database scan and restructures the list for efficient mining. Moreover, four efficient threshold raising techniques along with a restoring technique are utilized to calculate the optimal threshold value in an accumulated incremental environment. The results of the experiments on runtime, memory, and scalability show that the suggested method efficiently processes the entire incremental database.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords