IEEE Access (Jan 2023)
A New Framework for Fraud Detection in Bitcoin Transactions Through Ensemble Stacking Model in Smart Cities
Abstract
Bitcoin has a reputation of being used for unlawful activities, such as money laundering, dark web transactions, and payments for ransomware in the context of smart cities. Blockchain technology prevents illegal transactions, but cannot detect these transactions. Anomaly detection is a fundamental technique for recognizing potential fraud. The heuristic and signature-based approaches were the foundation of earlier detection techniques, but tragically, these methods were insufficient to explore the entire complexity of anomaly detection. Machine Learning (ML) is a promising approach to anomaly detection, as it can be trained on large datasets of known malware samples to identify patterns and features of the transactions. Researchers are focusing on determining an efficient fraud and security threat detection model that overcomes the drawbacks of the existing methods. Therefore, ensemble learning can be applied to anomaly detection in Bitcoin by combining multiple ML classifiers. In the proposed model, the ADASYN-TL (Adaptive Synthetic + Tomek Link) balancing technique is used for data balancing. Random search, grid search and Bayesian optimization are used for hyperparameter tuning. The hyperparameters have a great impact on the performance of the model. For classification, we used the stacking model by combining Decision Tree, Naive Bayes, K-Nearest Neighbors, and Random Forest. We used SHapley Additive exPlanation (SHAP) to interpret the predictions of the stacking model. The model also explores the performance of different classifiers using accuracy, F1-score, Area Under Curve-Receiver Operating Characteristic (AUC-ROC), precision, recall, False Positive Rate (FPR) and execution time, and ultimately selects the ideal model. The proposed model contributes to the development of effective fraud detection models that address the limitations of the existing algorithms. Our stacking model, which combines the prediction of multiple classifiers, achieved the highest F1-score of 97%, precision of 96%, recall of 98%, accuracy of 97%, AUC-ROC of 99% and FPR of 3%.
Keywords