Clustering Mixed-Type Data via Dirichlet Process Mixture Model with Cluster-Specific Covariance Matrices

Nurul Afiqah Burhanuddin; Kamarulzaman Ibrahim; Hani Syahida Zulkafli; Norwati Mustapha

doi:10.3390/sym16060712

Symmetry (Jun 2024)

Clustering Mixed-Type Data via Dirichlet Process Mixture Model with Cluster-Specific Covariance Matrices

Nurul Afiqah Burhanuddin,
Kamarulzaman Ibrahim,
Hani Syahida Zulkafli,
Norwati Mustapha

Affiliations

Nurul Afiqah Burhanuddin: Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
Kamarulzaman Ibrahim: Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
Hani Syahida Zulkafli: Department of Mathematics and Statistics, Faculty of Science, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia
Norwati Mustapha: Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia

DOI: https://doi.org/10.3390/sym16060712
Journal volume & issue: Vol. 16, no. 6
p. 712

Abstract

Read online

Many studies have shown successful applications of the Dirichlet process mixture model (DPMM) for clustering continuous data. Beyond continuous data, in practice, one can expect to see different data types, including ordinal and nominal data. Existing DPMMs for clustering mixed-type data assume a strict covariance matrix structure, resulting in an overfit model. This article explores a DPMM for mixed-type data that allows the covariance matrix to differ from one cluster to another. We assume an underlying latent variable framework for ordinal and nominal data, which is then modeled jointly with the continuous data. The identifiability issue on the covariance matrix poses computational challenges, thus requiring a nonstandard inferential algorithm. The applicability and flexibility of the proposed model are illustrated through simulation examples and real data applications.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords