Algorithms (May 2022)
Data Preprocessing Method and API for Mining Processes from Cloud-Based Application Event Logs
Abstract
Process mining (PM) exploits event logs to obtain meaningful information about the processes that produced them. As the number of applications developed on cloud infrastructures is increasing, it becomes important to study and discover their underlying processes. However, many current PM technologies face challenges in dealing with complex and large event logs from cloud applications, especially when they have little structure (e.g., clickstreams). By using Design Science Research, this paper introduces a new method, called cloud pattern API-process mining (CPA-PM), which enables the discovery and analysis of cloud-based application processes using PM in a way that addresses many of these challenges. CPA-PM exploits a new application programming interface, with an R implementation, for creating repeatable scripts that preprocess event logs collected from such applications. Applying CPA-PM to a case with real and evolving event logs related to the trial process of a software-as-a-service cloud application led to useful analyses and insights, with reusable scripts. CPA-PM helps producing executable scripts for filtering event logs from clickstream and cloud-based applications, where the scripts can be used in pipelines while minimizing the need for error-prone and time-consuming manual filtering.
Keywords