Journal of Statistical Software (Apr 2016)

GMCM: Unsupervised Clustering and Meta-Analysis Using Gaussian Mixture Copula Models

  • Anders Ellern Bilgrau,
  • Poul Svante Eriksen,
  • Jakob Gulddahl Rasmussen,
  • Hans Erik Johnsen,
  • Karen Dybkaer,
  • Martin Boegsted

DOI
https://doi.org/10.18637/jss.v070.i02
Journal volume & issue
Vol. 70, no. 1
pp. 1 – 23

Abstract

Read online

Methods for clustering in unsupervised learning are an important part of the statistical toolbox in numerous scientific disciplines. Tewari, Giering, and Raghunathan (2011) proposed to use so-called Gaussian mixture copula models (GMCM) for general unsupervised learning based on clustering. Li, Brown, Huang, and Bickel (2011) independently discussed a special case of these GMCMs as a novel approach to meta-analysis in highdimensional settings. GMCMs have attractive properties which make them highly flexible and therefore interesting alternatives to other well-established methods. However, parameter estimation is hard because of intrinsic identifiability issues and intractable likelihood functions. Both aforementioned papers discuss similar expectation-maximization-like algorithms as their pseudo maximum likelihood estimation procedure. We present and discuss an improved implementation in R of both classes of GMCMs along with various alternative optimization routines to the EM algorithm. The software is freely available in the R package GMCM. The implementation is fast, general, and optimized for very large numbers of observations. We demonstrate the use of package GMCM through different applications.

Keywords