Predictive Fraud Analysis Applying the Fraud Triangle Theory through Data Mining Techniques

Marco Sánchez-Aguayo; Luis Urquiza-Aguiar; José Estrada-Jiménez

doi:10.3390/app12073382

Applied Sciences (Mar 2022)

Predictive Fraud Analysis Applying the Fraud Triangle Theory through Data Mining Techniques

Marco Sánchez-Aguayo,
Luis Urquiza-Aguiar,
José Estrada-Jiménez

Affiliations

Marco Sánchez-Aguayo: Departamento de Informática y Ciencias de la Computación, Escuela Politécnica Nacional, Ladrón de Guevara E11-253, Quito 170517, Ecuador
Luis Urquiza-Aguiar: Departamento de Electrónica, Telecomunicaciones y Redes de Información, Escuela Politécnica Nacional, Ladrón de Guevara E11-253, Quito 170517, Ecuador
José Estrada-Jiménez: Departamento de Electrónica, Telecomunicaciones y Redes de Información, Escuela Politécnica Nacional, Ladrón de Guevara E11-253, Quito 170517, Ecuador

DOI: https://doi.org/10.3390/app12073382
Journal volume & issue: Vol. 12, no. 7
p. 3382

Abstract

Read online

Fraud is increasingly common, and so are the losses caused by this phenomenon. There is, thus, an essential economic incentive to study this problem, particularly fraud prevention. One barrier complicating the research in this direction is the lack of public data sets that embed fraudulent activities. In addition, although efforts have been made to detect fraud using machine learning, such actions have not considered the component of human behavior when detecting fraud. We propose a mechanism to detect potential fraud by analyzing human behavior within a data set in this work. This approach combines a predefined topic model and a supervised classifier to generate an alert from the possible fraud-related text. Potential fraud would be detected based on a model built from such a classifier. As a result of this work, a synthetic fraud-related data set is made. Four topics associated with the vertices of the fraud triangle theory are unveiled when assessing different topic modeling techniques. After benchmarking topic modeling techniques and supervised and deep learning classifiers, we find that LDA, random forest, and CNN have the best performance in this scenario. The results of our work suggest that our approach is feasible in practice since several such models obtain an average AUC higher than 0.8. Namely, the fraud triangle theory combined with topic modeling and linear classifiers could provide a promising framework for predictive fraud analysis.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords