Applied Sciences (Feb 2023)
A Powerful Predicting Model for Financial Statement Fraud Based on Optimized XGBoost Ensemble Learning Technique
Abstract
This study aims to develop a better Financial Statement Fraud (FSF) detection model by utilizing data from publicly available financial statements of firms in the MENA region. We develop an FSF model using a powerful ensemble technique, the XGBoost (eXtreme Gradient Boosting) algorithm, that helps to identify fraud in a set of sample companies drawn from the Middle East and North Africa (MENA) region. The issue of class imbalance in the dataset is addressed by applying the Synthetic Minority Oversampling Technique (SMOTE) algorithm. We use different Machine Learning techniques in Python to predict FSF, and our empirical findings show that the XGBoost algorithm outperformed the other algorithms in this study, namely, Logistic Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), AdaBoost, and Random Forest (RF). We then optimize the XGBoost algorithm to obtain the best result, with a final accuracy of 96.05% in the detection of FSF.
Keywords