Research in Statistics (Feb 2024)

An interpretable predictive model for bank customers’ income using the eXtreme Gradient Boosting algorithm and the SHAP method: a case study of an Anonymous Chilean Bank

  • Patricio Salas,
  • Patricio Sáez,
  • Vicente Marchant

DOI
https://doi.org/10.1080/27684520.2024.2312290
Journal volume & issue
Vol. 2, no. 1

Abstract

Read online

AbstractIn the dynamic landscape of banking institutions, acquiring accurate and timely information regarding customers’ incomes is crucial for effectively managing financial product offerings. To meet this demand, these institutions construct predictive models using numerous features, with only a subset contributing to capturing income variability. In this study, we propose a methodology for predicting monthly incomes by employing an XGBoost model with a reduced number of features. Feature reduction is accomplished through the implementation of Boruta and BorutaSHAP, ensuring that no predictive power is lost throughout the process. To enhance the transparency of the model’s predictions, we used the Shapley Additive Explanations (SHAP) method. The dataset used was provided by an anonymous bank from Chile, consisting of 10,000 records, 426 features, and a substantial proportion of missing values. The results demonstrate that the combination of feature selection methods and the XGBoost algorithm enables the development of a more concise model that maintains predictive performance. By leveraging the SHAP method, financial institutions can consistently identify and track influential features, thereby reducing complexity and training time without compromising predictive power. This research offers valuable contributions to financial institutions, as they can adopt our methodology to consistently identify and track the most influential features.

Keywords