Sequence Embeddings Help Detect Insurance Fraud

Ivan Fursov; Elizaveta Kovtun; Rodrigo Rivera-Castro; Alexey Zaytsev; Rasul Khasyanov; Martin Spindler; Evgeny Burnaev

doi:10.1109/ACCESS.2022.3149480

IEEE Access (Jan 2022)

Sequence Embeddings Help Detect Insurance Fraud

Ivan Fursov,
Elizaveta Kovtun,
Rodrigo Rivera-Castro,
Alexey Zaytsev,
Rasul Khasyanov,
Martin Spindler,
Evgeny Burnaev

Affiliations

Ivan Fursov: Skolkovo Institute of Science and Technology, Moscow, Russia
Elizaveta Kovtun: ORCiD; Skolkovo Institute of Science and Technology, Moscow, Russia
Rodrigo Rivera-Castro: ORCiD; Skolkovo Institute of Science and Technology, Moscow, Russia
Alexey Zaytsev: Skolkovo Institute of Science and Technology, Moscow, Russia
Rasul Khasyanov: Skolkovo Institute of Science and Technology, Moscow, Russia
Martin Spindler: Faculty of Business Administration, University of Hamburg, Hamburg, Germany
Evgeny Burnaev: ORCiD; Skolkovo Institute of Science and Technology, Moscow, Russia

DOI: https://doi.org/10.1109/ACCESS.2022.3149480
Journal volume & issue: Vol. 10
pp. 32060 – 32074

Abstract

Read online

Roughly 10 percent of the insurance industry’s incurred losses are estimated to stem from fraudulent claims. One solution is to use tabular data to construct models that can distinguish between claims that are legitimate and those that are fraudulent. However, while canonical tabular data models enable robust fraud detection, complex sequential data have been out of the insurance industry’s scope. For health insurance, we propose deep learning architectures that process insurance data consisting of sequential records of patient visits and characteristics. Both the sequential and tabular components improve the quality of the model, generating new insights into the detection of health insurance fraud. Empirical results derived using relevant data from a health insurance company show that our approach outperforms state-of-the-art models and can substantially improve the claims management process. We obtain a ROC AUC metric of 0.873, while the best competitor based on state-of-the-art models achieves 0.815. Moreover, we demonstrate that our architectures are more robust to data corruption. As more and more semi-structured event sequence data become available to insurers, our methods will be valuable for many similar applications, particularly when variables have a large number of categories, such as those from the International Classification of Disease (ICD) codes or other classification codes.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords