Feature generation and contribution comparison for electronic fraud detection

Yen-Wu Ti; Yu-Yen Hsin; Tian-Shyr Dai; Ming-Chuan Huang; Liang-Chih Liu

doi:10.1038/s41598-022-22130-2

Scientific Reports (Oct 2022)

Feature generation and contribution comparison for electronic fraud detection

Yen-Wu Ti,
Yu-Yen Hsin,
Tian-Shyr Dai,
Ming-Chuan Huang,
Liang-Chih Liu

Affiliations

Yen-Wu Ti: College of Artificial Intelligence, Yango University
Yu-Yen Hsin: Institute of Finance, National Yang Ming Chiao Tung University
Tian-Shyr Dai: Department of Information Management and Finance and Institute of Finance, National Yang Ming Chiao Tung University
Ming-Chuan Huang: Institute of Computer Science and Engineering, National Yang Ming Chiao Tung University
Liang-Chih Liu: Department of Information and Finance Management, National Taipei University of Technology

DOI: https://doi.org/10.1038/s41598-022-22130-2
Journal volume & issue: Vol. 12, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Modern money transfer services are convenient, attracting fraudulent actors to run scams in which victims are deceived into transferring funds to fraudulent accounts. Machine learning models are broadly applied due to the poor fraud detection performance of traditional rule-based approaches. Learning directly from raw transaction data is impractical due to its high-dimensional nature; most studies construct features instead by extracting patterns from raw transaction data. Past literature categorizes these features into recency, frequency, monetary, and anomaly detection features. We use various machine learning algorithms to examine the performance of features in these four categories with real transaction data; we compare them with the performance of our feature generation guideline based on the statistical perspectives and characteristics of (non)-fraudulent accounts. The results show that except for the monetary category, other feature categories used in the literature perform poorly regardless of which machine learning algorithm is used; anomaly detection features perform the worst. We find that even statistical features generated based on financial knowledge yield limited performance on a real transaction dataset. Our atypical detection characteristic of normal accounts improves the ability to distinguish them from fraudulent accounts and hence improves the overall detection results, outperforming other existent methods.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal