NOTE: non-parametric oversampling technique for explainable credit scoring

Seongil Han; Haemin Jung; Paul D. Yoo; Alessandro Provetti; Andrea Cali

doi:10.1038/s41598-024-78055-5

Scientific Reports (Oct 2024)

NOTE: non-parametric oversampling technique for explainable credit scoring

Seongil Han,
Haemin Jung,
Paul D. Yoo,
Alessandro Provetti,
Andrea Cali

Affiliations

Seongil Han: School of Computing & Mathematical Sciences, University of London, Birkbeck College
Haemin Jung: Department of Industrial & Management Engineering, Korea National University of Transportation
Paul D. Yoo: School of Computing & Mathematical Sciences, University of London, Birkbeck College
Alessandro Provetti: School of Computing & Mathematical Sciences, University of London, Birkbeck College
Andrea Cali: School of Computing & Mathematical Sciences, University of London, Birkbeck College

DOI: https://doi.org/10.1038/s41598-024-78055-5
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Credit scoring models are critical for financial institutions to assess borrower risk and maintain profitability. Although machine learning models have improved credit scoring accuracy, imbalanced class distributions remain a major challenge. The widely used Synthetic Minority Oversampling TEchnique (SMOTE) struggles with high-dimensional, non-linear data and may introduce noise through class overlap. Generative Adversarial Networks (GANs) have emerged as an alternative, offering the ability to model complex data distributions. Conditional Wasserstein GANs (cWGANs) have shown promise in handling both numerical and categorical features in credit scoring datasets. However, research on extracting latent features from non-linear data and improving model explainability remains limited. To address these challenges, this paper introduces the Non-parametric Oversampling Technique for Explainable credit scoring (NOTE). The NOTE offers a unified approach that integrates a Non-parametric Stacked Autoencoder (NSA) for capturing non-linear latent features, cWGAN for oversampling the minority class, and a classification process designed to enhance explainability. The experimental results demonstrate that NOTE surpasses state-of-the-art oversampling techniques by improving classification accuracy and model stability, particularly in non-linear and imbalanced credit scoring datasets, while also enhancing the explainability of the results.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords