PeerJ Computer Science (Apr 2024)

A comparative study of feature selection and feature extraction methods for financial distress identification

  • Dovilė Kuizinienė,
  • Paulius Savickas,
  • Rimantė Kunickaitė,
  • Rūta Juozaitienė,
  • Robertas Damaševičius,
  • Rytis Maskeliūnas,
  • Tomas Krilavičius

DOI
https://doi.org/10.7717/peerj-cs.1956
Journal volume & issue
Vol. 10
p. e1956

Abstract

Read online Read online

Financial distress identification remains an essential topic in the scientific literature due to its importance for society and the economy. The advancements in information technology and the escalating volume of stored data have led to the emergence of financial distress that transcends the realm of financial statements and its’ indicators (ratios). The feature space could be expanded by incorporating new perspectives on feature data categories such as macroeconomics, sectors, social, board, management, judicial incident, etc. However, the increased dimensionality results in sparse data and overfitted models. This study proposes a new approach for efficient financial distress classification assessment by combining dimensionality reduction and machine learning techniques. The proposed framework aims to identify a subset of features leading to the minimization of the loss function describing the financial distress in an enterprise. During the study, 15 dimensionality reduction techniques with different numbers of features and 17 machine-learning models were compared. Overall, 1,432 experiments were performed using Lithuanian enterprise data covering the period from 2015 to 2022. Results revealed that the artificial neural network (ANN) model with 30 ranked features identified using the Random Forest mean decreasing Gini (RF_MDG) feature selection technique provided the highest AUC score. Moreover, this study has introduced a novel approach for feature extraction, which could improve financial distress classification models.

Keywords