IEEE Access (Jan 2021)
Process Model Enhancement Through Capturing Important Behaviors and Rating Trace Variants
Abstract
In the field of process discovery, it is worth noting that most process discovery algorithms assume that event logs are clean, i.e., event logs should not contain infrequent behaviors. However, real-life event logs often contain infrequent behaviors (i.e., outliers) and lead to quality issues of the discovered process model. On the other hand, driven by recent trends such as big data and process automation, the volume of event data is rapidly increasing: an event log may contain billions of event data. Unfortunately, some process mining algorithms and platforms may have difficulties handling such event logs. The ever-increasing size of event data and infrequent behaviors in the event log are two main challenges in the field of process discovery nowadays. However, little research has been conducted on simultaneously filtering infrequent behaviors and decreasing the size of the event log: Various filtering methods can filter infrequent behaviors, whereas the volume of the filtered log is still considerable. On the other hand, sampling methods can reduce the size of the event log, but the processed event log may still contain infrequent behaviors. Therefore, this paper proposes a technique to simultaneously filter infrequent behaviors and control the volume of input logs by capturing important behaviors and rating trace variants. Our experiments show that our approach can significantly improve the quality of the discovered process models. Furthermore, our approach can obtain a better process model from 0.001% trace variants than the complete event log and significantly improves the runtime of discovery algorithms.
Keywords