Explainable unsupervised anomaly detection for healthcare insurance data

Hannes De Meulemeester; Frank De Smet; Johan van Dorst; Elise Derroitte; Bart De Moor

doi:10.1186/s12911-024-02823-6

BMC Medical Informatics and Decision Making (Jan 2025)

Explainable unsupervised anomaly detection for healthcare insurance data

Hannes De Meulemeester,
Frank De Smet,
Johan van Dorst,
Elise Derroitte,
Bart De Moor

Affiliations

Hannes De Meulemeester: Department of Electrical Engineering, ESAT-STADIUS, KU Leuven
Frank De Smet: Christian Health Insurance Fund
Johan van Dorst: Christian Health Insurance Fund
Elise Derroitte: Christian Health Insurance Fund
Bart De Moor: Department of Electrical Engineering, ESAT-STADIUS, KU Leuven

DOI: https://doi.org/10.1186/s12911-024-02823-6
Journal volume & issue: Vol. 25, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Background Waste and fraud are important problems for health insurers to deal with. With the advent of big data, these insurers are looking more and more towards data mining and machine learning methods to help in detecting waste and fraud. However, labeled data is costly and difficult to acquire as it requires expert investigators and known care providers with atypical behavior. Methods In this work we show how recent advances in machine learning can be used to set up a workflow that can aid investigators in discovering practitioners or groups of practitioners with unusual resource use in order to more efficiently combat waste and fraud. We combine three different techniques, which have not been used in the context of healthcare insurance anomaly detection: categorical embeddings to deal with high-cardinality categorical variables, state-of-the-art unsupervised anomaly detection techniques to detect anomalies and Shapley additive explanations (SHAP) to explain the model output. Results The method has been evaluated on providers with a known anomalous profile and with the help of experts of the largest health insurance fund in Belgium. The quantitative experiments show that categorical embeddings offer a significant improvement compared to standard methods and that the state-of-the-art unsupervised anomaly detection techniques generally show an improvement over traditional methods. In a practical setting, the proposed workflow with SHAP was able to detect a previously unknown, anomalous trend among general practitioners. Conclusions The proposed workflow is able to detect known care providers with atypical behaviour and helps expert investigators in making informed decisions concerning possible fraud or overconsumption in the health insurance field.

Published in BMC Medical Informatics and Decision Making

ISSN: 1472-6947 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://bmcmedinformdecismak.biomedcentral.com

About the journal

Abstract

Keywords