IEEE Access (Jan 2024)

Comparative Analysis of Classification-Based and Regression-Based Predictive Process Monitoring Models for Accurate and Time-Efficient Remaining Time Prediction

  • Reza Aalikhani,
  • Mohammad Fathian,
  • Mohammad Reza Rasouli

DOI
https://doi.org/10.1109/ACCESS.2024.3397185
Journal volume & issue
Vol. 12
pp. 67063 – 67093

Abstract

Read online

Predictive Process Monitoring (PPM) techniques leverage incomplete execution traces and historical event logs to predict outcomes, activities, and remaining time in ongoing processes. Accurately predicting process remaining time benefits process managers, enabling proactive decisions. A prediction model’s effectiveness extends beyond accuracy, emphasizing timely predictions. Despite the continuous nature of time, the prevailing emphasis on regression-based approaches has overshadowed the untapped potential of classification-based methods. This study aims to perform a comparative analysis of Classification-Based PPM (CB-PPM) models and Regression-Based PPM (RB-PPM) models. The focus is on predicting remaining time in various processes, considering accuracy, offline execution time (model training), and online execution time (real-time predictions) as key evaluation criteria. We aim to assess the impact of model configuration on the performance of the prediction models. To accomplish this, our methodology includes designing experiments and implementing 136 PPM models on ten real-world datasets. These models configured with various combinations of four bucketing methods, five encoding methods, and eight prediction algorithms. The TOPSIS analysis highlights that the CB-PPM method is utilized in 90% of the most suitable models, whereas the RB-PPM method is found in only 10% of these models. The hypothesis testing results confirm that the CB-PPM method surpasses the RB-PPM method, significantly enhancing the accuracy of remaining time prediction. While the CB-PPM method has a higher online execution time, there is no observed increase in offline execution time. Furthermore, this study emphasizes the dataset-dependent nature of model configurations, underscoring that a single configuration may not universally apply to all datasets.

Keywords