IEEE Access (Jan 2020)

PrefixSpan Based Pattern Mining Using Time Sliding Weight From Streaming Data

  • Ji-Soo Kang,
  • Ji-Won Baek,
  • Kyungyong Chung

DOI
https://doi.org/10.1109/ACCESS.2020.3007485
Journal volume & issue
Vol. 8
pp. 124833 – 124844

Abstract

Read online

This study proposes the prefixSpan based pattern mining using time sliding weight from streaming data. To discover sequential patterns, it applies a time sliding weight to create a structure of projected DB Tree. For the time sliding weight, a time window is applied to the sequential data to calculate the label and support of the window. When a projected DB Tree is designed, the time weight calculated for each pattern is inserted in a table. At this time, the tree is updated by deleting the node whose time weight is less than the reference value. For this reason, whenever data is updated, the tree is sorted again. The reordering process removes the pattern of less influence by applying time weights. Therefore, it is possible to construct a projected DB Tree that can extract influential patterns. The performance of the proposed method is evaluated in three aspects. Firstly, the conventional PrefixSpan algorithm is compared with the proposed PrefixSpan algorithm based on time sliding weight in terms of the pattern generation time according to size of data. Secondly, the fitness of the proposed algorithm is evaluated through cross-validation. Thirdly, GSP, SPADE, and prefixSpan algorithms are compared with the proposed algorithm through F-measure. As a result of the evaluation, the proposed algorithm improves accuracy 75% more than PrefixSpan algorithm and the sequential pattern algorithms of GSP and SPADE. Regarding the comparison of F-measure based on precision and recall, the proposed one improves its performance about 83%.

Keywords