IEEE Access (Jan 2022)
A Sampling-Based Stack Framework for Imbalanced Learning in Churn Prediction
Abstract
Churn prediction is gaining popularity in the research community as a powerful paradigm that supports data-driven operational decisions. Datasets related to churn prediction are often skewed with imbalanced class distribution. Data-level solutions, like over-sampling and under-sampling, have been commonly used by researchers to address this problem. There are limited number of case studies that attempt to evolve these data-level solutions by integrating them with computationally advanced frameworks, like ensembles. Ensembles primarily employ algorithmic diversity using a fixed set of training instances to achieve superior performance. This study aims to introduce algorithmic diversity in ensembles by modifying the fixed set of training instances using diverse sampling strategies to increase predictive performance in imbalanced learning. Data is acquired from the world’s largest open hotel commerce platform company. A four-part series of experiments is conducted to analyze the effectiveness of sampling techniques and ensemble solutions on model performance. A new sampling-based stack framework called “Stacking of Samplers for Imbalanced Learning” is proposed. The framework combines the prediction capabilities of sampling solutions to stimulate the information gain of the meta features in ensemble. It is observed that the proposed framework leads to improvement in model performance with AUC of 86.4% and top-decile lift of 4.7 for customers of the hotel technology provider. Additionally, results show that the framework records a higher information gain for meta features used in a stack, compared to commonly used stack frameworks.
Keywords