Bagging Vs. Boosting in Ensemble Machine Learning? An Integrated Application to Fraud Risk Analysis in the Insurance Sector

Ruixing Ming; Osama Mohamad; Nisreen Innab; Mohamed Hanafy

doi:10.1080/08839514.2024.2355024

Applied Artificial Intelligence (Dec 2024)

Bagging Vs. Boosting in Ensemble Machine Learning? An Integrated Application to Fraud Risk Analysis in the Insurance Sector

Ruixing Ming,
Osama Mohamad,
Nisreen Innab,
Mohamed Hanafy

Affiliations

Ruixing Ming: School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, China
Osama Mohamad: School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, China
Nisreen Innab: Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University, Riyadh, Saudi Arabia
Mohamed Hanafy: Department of Statistics, Mathematics, and Insurance, Faculty of Commerce, Assuit University, Asyut, Egypt

DOI: https://doi.org/10.1080/08839514.2024.2355024
Journal volume & issue: Vol. 38, no. 1

Abstract

Read online

Addressing the pressing challenge of insurance fraud, which significantly impacts financial losses and trust within the insurance industry, this study introduces an innovative automated detection system utilizing ensemble machine learning (EML) algorithms. The approach encompasses four strategic phases: 1) Tackling data imbalance through diverse re-sampling methods (Over-sampling, Under-sampling, and Hybrid); 2) Optimizing feature selection (Filtering, Wrapping, and Embedding) to enhance model accuracy; 3) employing binary classification techniques (Bagging and Boosting) for effective fraud identification; and 4) applying explanatory model analysis (Shapley Additive Explanations, Break-down plot, and variable-importance Measure) to evaluate the influence of individual features on model performance. Our comprehensive analysis reveals that while not every re-sampling technique improves model performance, all feature selection methods markedly bolster predictive accuracy. Notably, the combination of the Gradient Boosting Machine (GBM) algorithm with NCR re-sampling and GBMVI feature selection emerges as the most effective configuration, offering superior fraud detection capabilities. This study not only advances the theoretical framework for combating insurance fraud through AI but also provides a practical blueprint for insurance companies aiming to incorporate advanced AI strategies into their fraud detection arsenals, thereby mitigating financial risks and fostering trust systems.

Published in Applied Artificial Intelligence

ISSN: 0883-9514 (Print); 1087-6545 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Science: Science (General): Cybernetics
Website: https://www.tandfonline.com/journals/uaai

About the journal