The effect of feature extraction and data sampling on credit card fraud detection

Zahra Salekshahrezaee; Joffrey L. Leevy; Taghi M. Khoshgoftaar

doi:10.1186/s40537-023-00684-w

Journal of Big Data (Jan 2023)

The effect of feature extraction and data sampling on credit card fraud detection

Zahra Salekshahrezaee,
Joffrey L. Leevy,
Taghi M. Khoshgoftaar

Affiliations

Zahra Salekshahrezaee: Florida Atlantic University
Joffrey L. Leevy: Florida Atlantic University
Taghi M. Khoshgoftaar: Florida Atlantic University

DOI: https://doi.org/10.1186/s40537-023-00684-w
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Training a machine learning algorithm on a class-imbalanced dataset can be a difficult task, a process that could prove even more challenging under conditions of high dimensionality. Feature extraction and data sampling are among the most popular preprocessing techniques. Feature extraction is used to derive a richer set of reduced dataset features, while data sampling is used to mitigate class imbalance. In this paper, we investigate these two preprocessing techniques, using a credit card fraud dataset and four ensemble classifiers (Random Forest, CatBoost, LightGBM, and XGBoost). Within the context of feature extraction, the Principal Component Analysis (PCA) and Convolutional Autoencoder (CAE) methods are evaluated. With regard to data sampling, the Random Undersampling (RUS), Synthetic Minority Oversampling Technique (SMOTE), and SMOTE Tomek methods are evaluated. The F1 score and Area Under the Receiver Operating Characteristic Curve (AUC) metrics serve as measures of classification performance. Our results show that the implementation of the RUS method followed by the CAE method leads to the best performance for credit card fraud detection.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords