Machine Learning for an Enhanced Credit Risk Analysis: A Comparative Study of Loan Approval Prediction Models Integrating Mental Health Data

Adnan Alagic; Natasa Zivic; Esad Kadusic; Dzenan Hamzic; Narcisa Hadzajlic; Mejra Dizdarevic; Elmedin Selmanovic

doi:10.3390/make6010004

Machine Learning and Knowledge Extraction (Jan 2024)

Machine Learning for an Enhanced Credit Risk Analysis: A Comparative Study of Loan Approval Prediction Models Integrating Mental Health Data

Adnan Alagic,
Natasa Zivic,
Esad Kadusic,
Dzenan Hamzic,
Narcisa Hadzajlic,
Mejra Dizdarevic,
Elmedin Selmanovic

Affiliations

Adnan Alagic: Polytechnic Faculty, University of Zenica, 72000 Zenica, Bosnia and Herzegovina
Natasa Zivic: Faculty of Digital Transformation (FDIT), Leipzig University of Applied Sciences, 04277 Leipzig, Germany
Esad Kadusic: Faculty of Educational Sciences, University of Sarajevo, 71000 Sarajevo, Bosnia and Herzegovina
Dzenan Hamzic: Polytechnic Faculty, University of Zenica, 72000 Zenica, Bosnia and Herzegovina
Narcisa Hadzajlic: Polytechnic Faculty, University of Zenica, 72000 Zenica, Bosnia and Herzegovina
Mejra Dizdarevic: Polytechnic Faculty, University of Zenica, 72000 Zenica, Bosnia and Herzegovina
Elmedin Selmanovic: Faculty of Science, University of Sarajevo, 71000 Sarajevo, Bosnia and Herzegovina

DOI: https://doi.org/10.3390/make6010004
Journal volume & issue: Vol. 6, no. 1
pp. 53 – 77

Abstract

Read online

The number of loan requests is rapidly growing worldwide representing a multi-billion-dollar business in the credit approval industry. Large data volumes extracted from the banking transactions that represent customers’ behavior are available, but processing loan applications is a complex and time-consuming task for banking institutions. In 2022, over 20 million Americans had open loans, totaling USD 178 billion in debt, although over 20% of loan applications were rejected. Numerous statistical methods have been deployed to estimate loan risks opening the field to estimate whether machine learning techniques can better predict the potential risks. To study the machine learning paradigm in this sector, the mental health dataset and loan approval dataset presenting survey results from 1991 individuals are used as inputs to experiment with the credit risk prediction ability of the chosen machine learning algorithms. Giving a comprehensive comparative analysis, this paper shows how the chosen machine learning algorithms can distinguish between normal and risky loan customers who might never pay their debts back. The results from the tested algorithms show that XGBoost achieves the highest accuracy of 84% in the first dataset, surpassing gradient boost (83%) and KNN (83%). In the second dataset, random forest achieved the highest accuracy of 85%, followed by decision tree and KNN with 83%. Alongside accuracy, the precision, recall, and overall performance of the algorithms were tested and a confusion matrix analysis was performed producing numerical results that emphasized the superior performance of XGBoost and random forest in the classification tasks in the first dataset, and XGBoost and decision tree in the second dataset. Researchers and practitioners can rely on these findings to form their model selection process and enhance the accuracy and precision of their classification models.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords