Mathematics and Modeling in Finance (Dec 2023)

Modeling auto insurance frequency using K-means and mixture regression

  • Maryem Jaziri,
  • Afif Masmoudi

DOI
https://doi.org/10.22054/jmmf.2024.76043.1106
Journal volume & issue
Vol. 3, no. 2
pp. 93 – 109

Abstract

Read online

Given the importance of policyholder classification in helping to make a good decision in predicting optimal premiums for actuaries.This paper proposes, first, an optimal construction of policyholder classes. Second, Poisson-negative Binomial mixture regression model is proposed as an alternative to deal with the overdispersion of these classes.The proposed method is unique in that it takes Tunisian data and classifies the insured population based on the K-means approach which is an unsupervised machine learning algorithm. The choice of the model becomes extremely difficult due to the presence of zero mass in one of the classes and the significant degree of overdispersion. For this purpose, we proposed a mixture regression model that leads us to estimate the density of each class and to predict its probability distribution that allows us to understand the underlying properties of our data. In the learning phase, we estimate the values of the model parameters using the Expectation-Maximization algorithm. This allows us to determine the probability of occurrence of each new insured to create the most accurate classification. The goal of using mixed regression is to get as heterogeneous a classification as possible while having a better approximation. The proposed mixed regression model, which uses a number of factors, has been evaluated on different criteria, including mean square error, variance, chi-square test and accuracy. According to the experimental findings on several datasets, the approach can reach an overall accuracy of 80%. Then, the application on real Tunisian data shows the effectiveness of using the mixed regression model.

Keywords