Scientific African (Mar 2024)

Empirical analysis of tree-based classification models for customer churn prediction

  • Fatima E. Usman-Hamza,
  • Abdullateef O. Balogun,
  • Salahdeen K. Nasiru,
  • Luiz Fernando Capretz,
  • Hammed A. Mojeed,
  • Shakirat A. Salihu,
  • Abimbola G. Akintola,
  • Modinat A. Mabayoje,
  • Joseph B. Awotunde

Journal volume & issue
Vol. 23
p. e02054

Abstract

Read online

Customer churn is a vital and reoccurring problem facing most business industries, particularly the telecommunications industry. Considering the fierce competition among telecommunications firms and the high expenses of attracting and gaining new subscribers, keeping existing loyal subscribers becomes crucial. Early prediction of disgruntled subscribers can assist telecommunications firms in identifying the reasons for churn and in deploying applicable innovative policies to boost productivity, maintain market competitiveness, and reduce monetary damages. Controlling customer churn through the development of efficient and dependable customer churn prediction (CCP) solutions is imperative to attaining this goal. According to the outcomes of current CCP research, several strategies, including rule-based and machine-learning (ML) processes, have been proposed to handle the CCP phenomenon. However, the lack of flexibility and robustness of rule based CCP solutions is a fundamental shortcoming, and the lopsided distribution of churn datasets is deleterious to the efficacy of most traditional ML techniques in CCP. Regardless, ML-based CCP solutions have been reported to be more effective than other forms of CCP solutions. Unlike linear-based, instance-based, and function-based ML classifiers, tree-based ML classifiers are known to generate predictive models with high accuracy, high stability, and ease of interpretation. However, the deployment of tree-based classifiers for CCP is limited in most cases to the decision tree (DT) and random forest (RF). Hence, this research investigated the effectiveness of tree-based classifiers with diverse computational properties in CCP. Specifically, the CCP performances of diverse tree-based classifiers such as the single, ensemble, enhanced, and hybrid tree-based classifiers are investigated. Also, the effects of data quality problems such as the class imbalance problem (CIP) on the predictive performances of tree-based classifiers and their homogeneous ensemble variants on CCP were assessed. From the experimental results, it was observed that the investigated tree-based classifiers outperformed other forms of classifiers such as linear-based (Support Vector Machine (SVM)), instance-based (K-Nearest Neighbour (KNN)), Bayesian-based (Naïve Bayes (NB)) and function-based (MultiLayer Perceptron (MLP)) classifiers in most cases with or without the CIP. Also, it was observed that the CIP has a significant effect on the CCP performances of investigated tree-based classifiers, but the combination of a data sampling technique and a homogeneous ensemble method can be an effective solution to CIP and also generate efficient CCP models.

Keywords