IEEE Access (Jan 2016)

An Effective Pattern Pruning and Summarization Method Retaining High Quality Patterns With High Area Coverage in Relational Datasets

  • Pei-Yuan Zhou,
  • Gary C. L. Li,
  • Andrew K. C. Wong

DOI
https://doi.org/10.1109/ACCESS.2016.2624418
Journal volume & issue
Vol. 4
pp. 7847 – 7858

Abstract

Read online

Pattern mining has been widely used to uncover interesting patterns from data. However, one of its main problems is that it produces too many patterns and many of them are redundant. To reduce the number of redundant patterns and retain overlapping ones, delta-closed pattern pruning was introduced, yet it can only prune subpatterns if they are covered by superpatterns. Such unduly superpatterns need to be pruned. Furthermore, in order to improve the management and interpretation of patterns, pattern summarization is proposed. It renders a small number of patterns that retain the most crucial information. RuleCover algorithm was one of such algorithms. However, it tends to produce over trivial patterns, whereas more interesting and revealing ones may be pruned. To overcome these problems, this paper presents a new algorithm which integrates delta-closed, and RuleCover methods with our other two new algorithms: 1) statistically induced pattern pruning for pruning statistically induced superpatterns by strong subpatterns and 2) AreaCover algorithm for pruning overlapping patterns but retain higher order and high quality patterns with large coverage of the data “area.” Experimental results show that the proposed algorithms produce very compact yet comprehensive knowledge from patterns discovered from relational data sets.

Keywords