An Improved Ensemble Method With Data Resampling for Credit Risk Prediction

Idowu Aruleba; Yanxia Sun

doi:10.1109/ACCESS.2025.3563432

IEEE Access (Jan 2025)

An Improved Ensemble Method With Data Resampling for Credit Risk Prediction

Idowu Aruleba,
Yanxia Sun

Affiliations

Idowu Aruleba: ORCiD; Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa
Yanxia Sun: ORCiD; Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa

DOI: https://doi.org/10.1109/ACCESS.2025.3563432
Journal volume & issue: Vol. 13
pp. 71275 – 71287

Abstract

Read online

The increasing complexity and dynamic nature of financial data present significant challenges in accurately predicting credit risk, a critical task in the banking and finance sector. The application of machine learning (ML) in credit risk prediction has been hindered by the imbalanced nature of credit datasets. This study proposes an improved approach for predicting credit risk using a stacked ensemble method combined with a hybrid data resampling technique. The ensemble comprises random forests, logistic regression, and a convolutional neural network (CNN) as base learners, with the multilayer perceptron (MLP) serving as a meta-learner. To address the data imbalance, the Synthetic Minority Over-sampling Technique and Edited Nearest Neighbors (SMOTE-ENN) technique were applied. The proposed approach is benchmarked against other well-performing classifiers, including random forest, logistic regression, MLP, and CNN. The integration of hybrid data resampling with a robust stacking ensemble significantly enhanced credit risk prediction, with the proposed approach achieving sensitivity and specificity of 0.921 and 0.946 for the Australian dataset and 0.928 and 0.891 for the German dataset. Also, the stacked classifier achieved a sensitivity and specificity of 0.000 and 1.000 before data resampling for the Credit Risk Classification dataset with an accuracy of 0.7644. After data resampling, the accuracy, sensitivity, and specificity are 0.8056, 0.7989 and 0.8125, respectively. On the other hand, using the credit risk analysis for the extended banking loans dataset, the accuracy, sensitivity and specificity of the stacked classifier before data resampling are 0.8429, 0.6316, and 0.9216, respectively. After data resampling, the accuracy, sensitivity and specificity scores of the stacked classifier trained using the credit risk analysis for the extended banking loans dataset are 0.9632, 1.0000, and 0.9242, respectively. This shows that after data resampling, the performance of the stacked classifier trained using the credit risk analysis for the extended banking loans dataset outperformed other models.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords