International Journal of Computational Intelligence Systems (Jun 2008)
A Comparative Study of Various Probability Density Estimation Methods for Data Analysis
Abstract
Probability density estimation (PDF) is a task of primary importance in many contexts, including Bayesian learning and novelty detection. Despite the wide variety of methods at disposal to estimate PDF, only a few of them are widely used in practice by data analysts. Among the most used methods are the histograms, Parzen windows, vector quantization based Parzen, and finite Gaussian mixtures. This paper compares these estimations methods from a practical point of view, i.e. when the user is faced to various requirements from the applications. In particular it addresses the question of which method to use when the learning sample is large or small, and of the computational complexity resulting from the choice (by cross-validation methods) of external parameters such as the number of kernels and their widths in kernel mixture models, the robustness to initial conditions, etc. Expected behaviour of the estimation algorithms is drawn from an algorithmic perspective; numerical experiments are used to illustrate these results.
Keywords