IEEE Access (Jan 2019)
Hybrid Chain-Hypergraph P Systems for Multiobjective Ensemble Clustering
Abstract
Clustering is a classic combined optimization problem that is widely used in pattern recognition, image processing, market analysis and so on. However, the efficiency of clustering algorithms decreases as the amount of data increases. In addition, most of the existing methods optimize only one objective and therefore may be suitable only for datasets with certain features. To address these limitations, in this paper, we develop a new hybrid chain-hypergraph P system (named HCHPS), which makes full use of the parallelism of P systems as well as the advantages of chain and hypergraph topology structures for accurate and efficient clustering. Our new P system comprises three types of subsystems, i.e., reaction chain membrane subsystems, local communication membrane subsystems and global ensemble membrane subsystems. Each type of subsystems is implemented end-to-end in HCHPS with new rules and membrane structures in parallel. In particular, to obtain efficient clustering center objects and make the algorithm robust to data with various features, the reaction chain membrane subsystems perform three different multiobjective strategies simultaneously by new chain evolution rules. To increase the population diversity of cluster centers, the local communication membrane subsystems utilize transport rules between membranes for coevolution of nondominated objects. The global ensemble membrane subsystems conduct a new dense representation multisize ensemble strategy to further improve the accuracy of the final results. Evaluations on two artificial data sets and 17 real-life data sets demonstrate the robustness of the proposed method in correctly clustering data sets with different dimensions and shapes. Our experimental results outperform those of both baseline and state-of-the-art methods. Moreover, benefiting from the parallelism, HCHPS is less time consuming than other methods, featuring an average completion time of 28.07 seconds on the 17 real-life data sets. Moreover, an ablation study shows that our proposed components are critical for effective cluster analysis.
Keywords