Feature Enhanced Ensemble Modeling With Voting Optimization for Credit Risk Assessment

Dongqi Yang; Binqing Xiao

doi:10.1109/ACCESS.2024.3445499

IEEE Access (Jan 2024)

Feature Enhanced Ensemble Modeling With Voting Optimization for Credit Risk Assessment

Dongqi Yang,
Binqing Xiao

Affiliations

Dongqi Yang: ORCiD; School of Management and Engineering, Nanjing University, Nanjing, China
Binqing Xiao: ORCiD; School of Management and Engineering, Nanjing University, Nanjing, China

DOI: https://doi.org/10.1109/ACCESS.2024.3445499
Journal volume & issue: Vol. 12
pp. 115124 – 115136

Abstract

Read online

Machine learning methods have gained widespread utilization in small and micro enterprise credit risk assessment. However, the practical application of these methods encounters a conundrum involving accuracy and interpretability. In this study, a multi-stage ensemble model is proposed to enhance the model’s interpretability. To strengthen predictive portraits, a multi-feature enhancement method is proposed to integrate non-financial behavioral information and soft information on credit rating into the annual loan ledger data, thereby bolstering the explanatory capacity of the features. To rectify the issue of data imbalance and avoid information loss, a new bagging-based oversampling method is proposed to oversample the minority class samples in multiple parallelized subsets divided by the bagging strategy. To unleash the performance potential of base classifiers, a new voting-weight optimization method is proposed to optimize the soft voting weights of the candidate base classifiers. The experiment results of an annual loan ledger dataset of a commercial bank in China (with an accuracy of 97.9%, an area under the curve of 0.97, a logistic loss of 0.07, a Brier score of 0.01, and a Kolmogorov-Smirnov statistic of 0.38) and the other five public datasets indicating excellent model fit. By focusing on the widespread soft information and data structures characteristic of SME loan risk assessment data, an additional SHAP model explanation method enhances interpretability. This method reveals that the enhanced ‘debt-to-income ratio,’ along with non-financial behavioral information and features derived from soft information, are essential for predicting loan defaults. Such enhancements help to alleviate the issue of information asymmetry in SME loan risk assessment.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords