Empirical analysis of tree-based classification models for customer churn prediction

Fatima E. Usman-Hamza; Abdullateef O. Balogun; Salahdeen K. Nasiru; Luiz Fernando Capretz; Hammed A. Mojeed; Shakirat A. Salihu; Abimbola G. Akintola; Modinat A. Mabayoje; Joseph B. Awotunde

Scientific African (Mar 2024)

Empirical analysis of tree-based classification models for customer churn prediction

Fatima E. Usman-Hamza,
Abdullateef O. Balogun,
Salahdeen K. Nasiru,
Luiz Fernando Capretz,
Hammed A. Mojeed,
Shakirat A. Salihu,
Abimbola G. Akintola,
Modinat A. Mabayoje,
Joseph B. Awotunde

Affiliations

Fatima E. Usman-Hamza: Department of Computer Science, University of Ilorin, Ilorin 1515 Ilorin, Nigeria
Abdullateef O. Balogun: Department of Computer and Information Science, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 32610 Perak, Malaysia; Corresponding author.
Salahdeen K. Nasiru: Department of Computer Science, University of Ilorin, Ilorin 1515 Ilorin, Nigeria
Luiz Fernando Capretz: Department of Electrical and Computer Engineering, Western University, London, Ontario, N6A 5B9, Canada
Hammed A. Mojeed: Department of Computer Science, University of Ilorin, Ilorin 1515 Ilorin, Nigeria; Department of Technical Informatics and Telecommunications, Gdańsk University of Technology, Gabriela Narutowicza 11/12, 80-233 Gdańsk, Poland
Shakirat A. Salihu: Department of Computer Science, University of Ilorin, Ilorin 1515 Ilorin, Nigeria
Abimbola G. Akintola: Department of Computer Science, University of Ilorin, Ilorin 1515 Ilorin, Nigeria
Modinat A. Mabayoje: Department of Computer Science, University of Ilorin, Ilorin 1515 Ilorin, Nigeria
Joseph B. Awotunde: Department of Computer Science, University of Ilorin, Ilorin 1515 Ilorin, Nigeria

Journal volume & issue: Vol. 23
p. e02054

Abstract

Read online

Customer churn is a vital and reoccurring problem facing most business industries, particularly the telecommunications industry. Considering the fierce competition among telecommunications firms and the high expenses of attracting and gaining new subscribers, keeping existing loyal subscribers becomes crucial. Early prediction of disgruntled subscribers can assist telecommunications firms in identifying the reasons for churn and in deploying applicable innovative policies to boost productivity, maintain market competitiveness, and reduce monetary damages. Controlling customer churn through the development of efficient and dependable customer churn prediction (CCP) solutions is imperative to attaining this goal. According to the outcomes of current CCP research, several strategies, including rule-based and machine-learning (ML) processes, have been proposed to handle the CCP phenomenon. However, the lack of flexibility and robustness of rule based CCP solutions is a fundamental shortcoming, and the lopsided distribution of churn datasets is deleterious to the efficacy of most traditional ML techniques in CCP. Regardless, ML-based CCP solutions have been reported to be more effective than other forms of CCP solutions. Unlike linear-based, instance-based, and function-based ML classifiers, tree-based ML classifiers are known to generate predictive models with high accuracy, high stability, and ease of interpretation. However, the deployment of tree-based classifiers for CCP is limited in most cases to the decision tree (DT) and random forest (RF). Hence, this research investigated the effectiveness of tree-based classifiers with diverse computational properties in CCP. Specifically, the CCP performances of diverse tree-based classifiers such as the single, ensemble, enhanced, and hybrid tree-based classifiers are investigated. Also, the effects of data quality problems such as the class imbalance problem (CIP) on the predictive performances of tree-based classifiers and their homogeneous ensemble variants on CCP were assessed. From the experimental results, it was observed that the investigated tree-based classifiers outperformed other forms of classifiers such as linear-based (Support Vector Machine (SVM)), instance-based (K-Nearest Neighbour (KNN)), Bayesian-based (Naïve Bayes (NB)) and function-based (MultiLayer Perceptron (MLP)) classifiers in most cases with or without the CIP. Also, it was observed that the CIP has a significant effect on the CCP performances of investigated tree-based classifiers, but the combination of a data sampling technique and a homogeneous ensemble method can be an effective solution to CIP and also generate efficient CCP models.

Published in Scientific African

ISSN: 2468-2276 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Science
Website: https://www.journals.elsevier.com/scientific-african

About the journal

Abstract

Keywords