PLoS ONE (Jan 2023)

Combination of unsupervised discretization methods for credit risk.

  • José G Fuentes Cabrera,
  • Hugo A Pérez Vicente,
  • Sebastián Maldonado,
  • Jonás Velasco

DOI
https://doi.org/10.1371/journal.pone.0289130
Journal volume & issue
Vol. 18, no. 11
p. e0289130

Abstract

Read online

Creating robust and explainable statistical learning models is essential in credit risk management. For this purpose, equally spaced or frequent discretization is the de facto choice when building predictive models. The methods above have limitations, given that when the discretization procedure is constrained, the underlying patterns are lost. This study introduces an innovative approach by combining traditional discretization techniques with clustering-based discretization, specifically k means and Gaussian mixture models. The study proposes two combinations: Discrete Competitive Combination (DCC) and Discrete Exhaustive Combination (DEC). Discrete Competitive Combination selects features based on the discretization method that performs better on each feature, whereas Discrete Exhaustive Combination includes every discretization method to complement the information not captured by each technique. The proposed combinations were tested on 11 different credit risk datasets by fitting a logistic regression model using the weight of evidence transformation over the training partition and contrasted over the validation partition. The experimental findings showed that both combinations similarly outperform individual methods for the logistic regression without compromising the computational efficiency. More importantly, the proposed method is a feasible and competitive alternative to conventional methods without reducing explainability.