Systems (Jul 2024)
XGBoost-B-GHM: An Ensemble Model with Feature Selection and GHM Loss Function Optimization for Credit Scoring
Abstract
Credit evaluation has always been an important part of the financial field. The existing credit evaluation methods have difficulty in solving the problems of redundant data features and imbalanced samples. In response to the above issues, an ensemble model combining an advanced feature selection algorithm and an optimized loss function is proposed, which can be applied in the field of credit evaluation and improve the risk management ability of financial institutions. Firstly, the Boruta algorithm is embedded for feature selection, which can effectively reduce the data dimension and noise and improve the model’s capacity for generalization by automatically identifying and screening out features that are highly correlated with target variables. Then, the GHM loss function is incorporated into the XGBoost model to tackle the issue of skewed sample distribution, which is common in classification, and further improve the classification and prediction performance of the model. The comparative experiments on four large datasets demonstrate that the proposed method is superior to the existing mainstream methods and can effectively extract features and handle the problem of imbalanced samples.
Keywords