Threshold optimization and random undersampling for imbalanced credit card data

Joffrey L. Leevy; Justin M. Johnson; John Hancock; Taghi M. Khoshgoftaar

doi:10.1186/s40537-023-00738-z

Journal of Big Data (May 2023)

Threshold optimization and random undersampling for imbalanced credit card data

Joffrey L. Leevy,
Justin M. Johnson,
John Hancock,
Taghi M. Khoshgoftaar

Affiliations

Joffrey L. Leevy: Florida Atlantic University
Justin M. Johnson: Florida Atlantic University
John Hancock: Florida Atlantic University
Taghi M. Khoshgoftaar: Florida Atlantic University

DOI: https://doi.org/10.1186/s40537-023-00738-z
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 22

Abstract

Read online

Abstract Output thresholding is well-suited for addressing class imbalance, since the technique does not increase dataset size, run the risk of discarding important instances, or modify an existing learner. Through the use of the Credit Card Fraud Detection Dataset, this study proposes a threshold optimization approach that factors in the constraint True Positive Rate (TPR) ≥ True Negative Rate (TNR). Our findings indicate that an increase of the Area Under the Precision–Recall Curve (AUPRC) score is associated with an improvement in threshold-based classification scores, while an increase of positive class prior probability causes optimal thresholds to increase. In addition, we discovered that best overall results for the selection of an optimal threshold are obtained without the use of Random Undersampling (RUS). Furthermore, with the exception of AUPRC, we established that the default threshold yields good performance scores at a balanced class ratio. Our evaluation of four threshold optimization techniques, eight threshold-dependent metrics, and two threshold-agnostic metrics defines the uniqueness of this research.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords