IEEE Access (Jan 2021)

Finite Sample Based Mutual Information

  • Khairan Rajab,
  • Firuz Kamalov

DOI
https://doi.org/10.1109/ACCESS.2021.3107031
Journal volume & issue
Vol. 9
pp. 118871 – 118879

Abstract

Read online

Mutual information is a popular metric in machine learning. In case of a discrete target variable and a continuous feature variable the mutual information can be calculated as a sum-integral of weighted log likelihood ratio of joint and marginal density distributions. However, in practice the true density distributions are unavailable and only a finite sample of the population is given. In this paper, we propose a novel method for calculating the mutual information for continuous variables using a finite sample of the population. The proposed method is based on approximating the underlying continuous density distribution using Kernel Density Estimation. Unlike previous kernel-based approaches for estimating mutual information, our method calculates directly the integral involved in the formula. Numerical experiments demonstrate that the proposed method produces more accurate results than the currently used feature selection approaches. In addition, our method demonstrates substantially faster computation times than the benchmark methods.

Keywords