Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection

Xiaosong Zhao; Yong Liu; Qiangfu Zhao

doi:10.1109/ACCESS.2024.3487212

IEEE Access (Jan 2024)

Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection

Xiaosong Zhao,
Yong Liu,
Qiangfu Zhao

Affiliations

Xiaosong Zhao: ORCiD; Graduate School, The University of Aizu, Aizuwakamatsu, Fukushima, Japan
Yong Liu: ORCiD; Graduate School, The University of Aizu, Aizuwakamatsu, Fukushima, Japan
Qiangfu Zhao: ORCiD; Graduate School, The University of Aizu, Aizuwakamatsu, Fukushima, Japan

DOI: https://doi.org/10.1109/ACCESS.2024.3487212
Journal volume & issue: Vol. 12
pp. 159316 – 159335

Abstract

Read online

Credit card fraud (CCF) is a significant threat to cardholders and financial institutions. CCF detection against this threat is challenging due to extremely imbalanced data (EID). EID involves extremely few instances of fraud for training and an extremely high risk of overlooking fraud. While class balancing or oversampling techniques can address the former problem by punishing negative classes or augmenting the positive data, they do not mitigate the latter. In contrast, the cost-sensitive learning approach targets only the high risk of false negative errors. Therefore, existing approaches are insufficient to solve all the issues of the EID problem. Based on the LightGBM (Light Gradient Boosting Machine) framework, this study introduces two novel machine-learning methods: the class balancing cost-harmonization LightGBM (CB-CHL-LightGBM) and the oversampling cost-harmonization LightGBM (OS-CHL-LightGBM). The new approaches combine class balancing or oversampling technology with LightGBM to solve the EID problem comprehensively. They enhance the efficacy of LightGBM in CCF detection scenarios. Experimental results on three CCF datasets indicate that the two proposed methods outperform LightGBM in several crucial performance metrics. For example, compared with the original LightGBM, CB-CHL-LightGBM or OS-CHL-LightGBM can increase the F2-score from 0.77 to 0.83 for the first dataset, from 0.77 to 0.86 for the second dataset, and from 0.70 to 0.82 for the third dataset. However, adding class balancing, oversampling, and cost-harmonization loss separately to LightGBM may not obtain better results.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords