Explainable machine learning models for Medicare fraud detection

John T. Hancock; Richard A. Bauder; Huanjing Wang; Taghi M. Khoshgoftaar

doi:10.1186/s40537-023-00821-5

Journal of Big Data (Oct 2023)

Explainable machine learning models for Medicare fraud detection

John T. Hancock,
Richard A. Bauder,
Huanjing Wang,
Taghi M. Khoshgoftaar

Affiliations

John T. Hancock: College of Engineering and Computer Science, Florida Atlantic University
Richard A. Bauder: College of Engineering and Computer Science, Florida Atlantic University
Huanjing Wang: Ogden College of Science and Engineering, Western Kentucky University
Taghi M. Khoshgoftaar: College of Engineering and Computer Science, Florida Atlantic University

DOI: https://doi.org/10.1186/s40537-023-00821-5
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 31

Abstract

Read online

Abstract As a means of building explainable machine learning models for Big Data, we apply a novel ensemble supervised feature selection technique. The technique is applied to publicly available insurance claims data from the United States public health insurance program, Medicare. We approach Medicare insurance fraud detection as a supervised machine learning task of anomaly detection through the classification of highly imbalanced Big Data. Our objectives for feature selection are to increase efficiency in model training, and to develop more explainable machine learning models for fraud detection. Using two Big Data datasets derived from two different sources of insurance claims data, we demonstrate how our feature selection technique reduces the dimensionality of the datasets by approximately 87.5% without compromising performance. Moreover, the reduction in dimensionality results in machine learning models that are easier to explain, and less prone to overfitting. Therefore, our primary contribution of the exposition of our novel feature selection technique leads to a further contribution to the application domain of automated Medicare insurance fraud detection. We utilize our feature selection technique to provide an explanation of our fraud detection models in terms of the definitions of the selected features. The ensemble supervised feature selection technique we present is flexible in that any collection of machine learning algorithms that maintain a list of feature importance values may be used. Therefore, researchers may easily employ variations of the technique we present.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords