IEEE Access (Jan 2024)
Anomaly Detection in Restaurant Receipts Data
Abstract
Checking receipt data is essential for tax authorities to ensure compliance with financial reporting regulations, identify potential anomalies, and prevent tax violations. This study explores the application of genetic algorithms (GA) for hyperparameter optimization (HPO) in Isolation Forest (IF) and Autoencoder (AE) models with objective to enhance anomaly detection (AD) with focus on identifying cash registers (CRs) that display patterns closely resembling known anomalies, thereby improving AD capabilities. The dataset, comprising 2.7 million receipt records, encompassing transactional details such as payment amounts, dates, and the ratio of cash to cashless transactions. As anomalies were identified on CR level, the receipt dataset was underwent feature engineering aggregated by cash and obtained included 40,715 CRs, with 200 known anomalies. The primary goal was to accurately identify existing anomalies and those closely resembling them, which are referred to as false positives (FP) obtained by the best optimized IF and AE models, as comprehensively as possible. The best-performing IF model on a test set of 20,375 CR records with 100 known anomalies correctly identified 25 anomalies and flagged 999 similar (FP) patterns. The AE model, tested on 8,154 records with 39 known anomalies, correctly identified 15 and flagged an additional 437 CRs (FP). Subsequent clustering analysis of FP and existing anomalies - true positives (TP) reveals nuanced restaurant transactional patterns suggestive of potential tax evasion, aiding in strategic decision-making regarding tax authority inspections.
Keywords