Journal of Theoretical and Applied Electronic Commerce Research (Jul 2022)
Customer Response Model in Direct Marketing: Solving the Problem of Unbalanced Dataset with a Balanced Support Vector Machine
Abstract
Customer response models have gained popularity due to their ability to significantly improve the likelihood of targeting the customers most likely to buy a product or a service. These models are built using databases of previous customers’ buying decisions. However, a smaller number of customers in these databases often bought the product or service than those who did not do so, resulting in unbalanced datasets. This problem is especially significant for online marketing campaigns when the class imbalance emerges due to many website sessions. Unbalanced datasets pose a specific challenge in data-mining modelling due to the inability of most of the algorithms to capture the characteristics of the classes that are unrepresented in the dataset. This paper proposes an approach based on a combination of random undersampling and Support Vector Machine (SVM) classification applied to the unbalanced dataset to create a Balanced SVM (B-SVM) data pre-processor resulting in a dataset that is analysed with several classifiers. The experiments indicate that using the B-SVM strategy combined with classification methods increases the base models’ predictive performance, indicating that the B-SVM approach efficiently pre-processes the data, correcting noise and class imbalance. Hence, companies may use the B-SVM approach to more efficiently select customers more likely to respond to a campaign.
Keywords