Journal of Social Computing (Sep 2024)

Credit Risk Prediction Based on Improved ADASYN Sampling and Optimized LightGBM

  • Mei Song,
  • He Ma,
  • Yi Zhu,
  • Mengdi Zhang

DOI
https://doi.org/10.23919/JSC.2024.0019
Journal volume & issue
Vol. 5, no. 3
pp. 232 – 241

Abstract

Read online

A credit risk prediction model named KM-ADASYN-TL-FLLightGBM (KADT-FLightGBM) is proposed in this study. Firstly, to overcome the limitation of traditional sampling methods in dealing with imbalanced datasets, an improved ADASYN sampling with K-means clustering algorithm is constructed. Moreover, the Tomek Links method is used to filter the generated samples. Secondly, an utilized an optimized LightGBM algorithm with the Focal Loss is employed to training the model using the datasets obtained by the improved ADASYN sampling. Finally, the comparative analysis between the ensemble model and other different sampling methodologies is conducted on the Lending Club dataset. The results demonstrate that the proposed model effectively minimizes the misclassification of minority classes in credit risk prediction and can be used as a reference for similar studies.

Keywords