IEEE Access (Jan 2025)
Enhancing Cluster Accuracy in Diabetes Multimorbidity With Dirichlet Process Mixture Models
Abstract
Clustering of diabetic multimorbidity data from EHRs is challenging due to patient heterogeneity, high-dimensional variables, sensitivity to parameter settings, and high computational demands, which complicate clustering processes and may result in suboptimal clustering results. These complex and imbalanced natures of diabetic multimorbidity data limit the effectiveness of traditional clustering techniques, producing suboptimal clusters and revealing inadequate clinically meaningful insights. This study addresses this gap by applying the Dirichlet Process Mixture Model (DPMM), a non-parametric clustering approach that does not require specifying cluster numbers and adapts dynamically to the underlying data structure. The major advantages of DPMM include (1) DPMM automatically adjusts the number of clusters based on data structure, enabling it to capture diverse patient profiles without needing predefined cluster counts, which is ideal for handling the variability in multimorbidity patterns; (2) DPMM estimates the distributional properties directly from the data, relying on proper parameter choices and improving the stability of clustering results across datasets; and (3) DPMM uses a Bayesian framework to iteratively converge toward optimal clustering solutions, efficiently managing large datasets and producing more clinically meaningful clusters. Additionally, Gibbs sampling is employed for robust convergence in parameter settings, minimizing the dependency on initial configurations and improving the consistency of clustering outcomes across various data contexts. Results show that DPMM consistently outperforms traditional methods in clustering high-dimensional and imbalanced datasets, offering significant translational potential for guiding tailored healthcare strategies for complex chronic diseases and optimizing healthcare resource allocation.
Keywords