IEEE Access (Jan 2023)

Model-Based Clustering of Mixed Data With Sparse Dependence

  • Young-Geun Choi,
  • Soohyun Ahn,
  • Jayoun Kim

DOI
https://doi.org/10.1109/ACCESS.2023.3296790
Journal volume & issue
Vol. 11
pp. 75945 – 75954

Abstract

Read online

Mixed data refers to a mixture of continuous and categorical variables. The clustering problem with mixed data is a long-standing statistical problem. The latent Gaussian mixture model, a model-based approach for such a problem, has received attention owing to its simplicity and interpretability. However, these approaches are prone to dimensionality problems. Specifically, parameters must be estimated for each group, and the number of covariance parameters is quadratic in the number of variables. To address this, we propose “regClustMD,” a novel model-based clustering method that can address sparse dependence among variables. We consider a sparse latent Gaussian mixture model, assuming that the precision matrix between variables has sparse nonzero elements. We propose maximizing a penalized complete log-likelihood using the Monte Carlo expectation-maximization (MCEM) algorithm. Our numerical experiments and real data analyses demonstrated that our method outperformed a counterpart algorithm in both accuracy and failure rate under the correlated data structure.

Keywords