Вісник Харківського національного університету імені В.Н. Каразіна: Серія Економіка (Dec 2021)
Insurance cases: analysis by machine learning
Abstract
One of the main problems of insurance is fraud, when the client wants to get overpayments by distorting information about the insured event. However, traditional methods of insurance fraud combating require a lot of routine manual work and are not very effective. The paper proposes the development of a prototype of the insurance case monitoring system in order to detect fraud using machine-learning methods. The development was carried out on the example of a database of insurance cases, which has 38 variables and contains 1000 records of insurance claims. The dataset provides information on 1) client – 10 features; 2) insurance contract – 7 features; 3) incident – 21 features. Preliminary data processing, modeling and development of the monitoring system was carried out using the Python. Classifiers (logistic regression, gradient boosting and random forest) with different combinations of variables were built. For each model, the conjugation matrix, accuracy, specificity, sensitivity, and ROC curves were analyzed. Simulation results allowed to select 5 main variables for monitoring, 3 of which characterize the client, 2 – incident. The proposed monitoring system allows to identify the following patterns: 1) in most cases, fraudsters were managers and technical support staff; 2) customers, who were practicing chess or CrossFit, were more prone to fraud; 3) most of the fraud was recorded in severe damage; 4) in case of absence of contact with emergency services, a large amount of the claim indicated fraud.
Keywords