Auto insurance fraud detection using unsupervised spectral ranking for anomaly

Ke Nian; Haofan Zhang; Aditya Tayal; Thomas Coleman; Yuying Li

doi:10.1016/j.jfds.2016.03.001

Journal of Finance and Data Science (Mar 2016)

Auto insurance fraud detection using unsupervised spectral ranking for anomaly

Ke Nian,
Haofan Zhang,
Aditya Tayal,
Thomas Coleman,
Yuying Li

Affiliations

Ke Nian: Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Haofan Zhang: Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Aditya Tayal: Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Thomas Coleman: Combinatorics and Optimization, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Yuying Li: Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada

DOI: https://doi.org/10.1016/j.jfds.2016.03.001
Journal volume & issue: Vol. 2, no. 1
pp. 58 – 75

Abstract

Read online

For many data mining problems, obtaining labels is costly and time consuming, if not practically infeasible. In addition, unlabeled data often includes categorical or ordinal features which, compared with numerical features, can present additional challenges. We propose a new unsupervised spectral ranking method for anomaly (SRA). We illustrate that the spectral optimization in SRA can be viewed as a relaxation of an unsupervised SVM problem. We demonstrate that the first non-principal eigenvector of a Laplacian matrix is linked to a bi-class classification strength measure which can be used to rank anomalies. Using the first non-principal eigenvector of the Laplacian matrix directly, the proposed SRA generates an anomaly ranking either with respect to the majority class or with respect to two main patterns. The choice of the ranking reference can be made based on whether the cardinality of the smaller class (positive or negative) is sufficiently large. Using an auto insurance claim data set but ignoring labels when generating ranking, we show that our proposed SRA significantly surpasses existing outlier-based fraud detection methods. Finally we demonstrate that, while proposed SRA yields good performance for a few similarity measures for the auto insurance claim data, notably ones based on the Hamming distance, choosing appropriate similarity measures for a fraud detection problem remains crucial.

Published in Journal of Finance and Data Science

ISSN: 2405-9188 (Online)
Publisher: KeAi Communications Co., Ltd.
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Social Sciences: Finance
Website: https://www.keaipublishing.com/en/journals/the-journal-of-finance-and-data-science/

About the journal

Abstract

Keywords