Machine Learning Based on Resampling Approaches and Deep Reinforcement Learning for Credit Card Fraud Detection Systems

Tran Khanh Dang; Thanh Cong Tran; Luc Minh Tuan; Mai Viet Tiep

doi:10.3390/app112110004

Applied Sciences (Oct 2021)

Machine Learning Based on Resampling Approaches and Deep Reinforcement Learning for Credit Card Fraud Detection Systems

Tran Khanh Dang,
Thanh Cong Tran,
Luc Minh Tuan,
Mai Viet Tiep

Affiliations

Tran Khanh Dang: Ho Chi Minh City University of Technology, VNU-HCM, Ho Chi Minh City, Vietnam
Thanh Cong Tran: Ho Chi Minh City University of Economics and Finance, Ho Chi Minh City, Vietnam
Luc Minh Tuan: Ton Duc Thang University, Ho Chi Minh City, Vietnam
Mai Viet Tiep: Academy of Cryptography Techniques, Ho Chi Minh City, Vietnam

DOI: https://doi.org/10.3390/app112110004
Journal volume & issue: Vol. 11, no. 21
p. 10004

Abstract

Read online

The problem of imbalanced datasets is a significant concern when creating reliable credit card fraud (CCF) detection systems. In this work, we study and evaluate recent advances in machine learning (ML) algorithms and deep reinforcement learning (DRL) used for CCF detection systems, including fraud and non-fraud labels. Based on two resampling approaches, SMOTE and ADASYN are used to resample the imbalanced CCF dataset. ML algorithms are, then, applied to this balanced dataset to establish CCF detection systems. Next, DRL is employed to create detection systems based on the imbalanced CCF dataset. The diverse classification metrics are indicated to thoroughly evaluate the performance of these ML and DRL models. Through empirical experiments, we identify the reliable degree of ML models based on two resampling approaches and DRL models for CCF detection. When SMOTE and ADASYN are used to resampling original CCF datasets before training/test split, the ML models show very high outcomes of above 99% accuracy. However, when these techniques are employed to resample for only the training CCF datasets, these ML models show lower results, particularly in terms of logistic regression with 1.81% precision and 3.55% F1 score for using ADASYN. Our work reveals the DRL model is ineffective and achieves low performance, with only 34.8% accuracy.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords