IEEE Access (Jan 2022)
Sequence Embeddings Help Detect Insurance Fraud
Abstract
Roughly 10 percent of the insurance industry’s incurred losses are estimated to stem from fraudulent claims. One solution is to use tabular data to construct models that can distinguish between claims that are legitimate and those that are fraudulent. However, while canonical tabular data models enable robust fraud detection, complex sequential data have been out of the insurance industry’s scope. For health insurance, we propose deep learning architectures that process insurance data consisting of sequential records of patient visits and characteristics. Both the sequential and tabular components improve the quality of the model, generating new insights into the detection of health insurance fraud. Empirical results derived using relevant data from a health insurance company show that our approach outperforms state-of-the-art models and can substantially improve the claims management process. We obtain a ROC AUC metric of 0.873, while the best competitor based on state-of-the-art models achieves 0.815. Moreover, we demonstrate that our architectures are more robust to data corruption. As more and more semi-structured event sequence data become available to insurers, our methods will be valuable for many similar applications, particularly when variables have a large number of categories, such as those from the International Classification of Disease (ICD) codes or other classification codes.
Keywords