Iraqi Journal for Computer Science and Mathematics (Jul 2024)
Convex Optimization Techniques for High-Dimensional Data Clustering Analysis: A Review
Abstract
Clustering techniques have been instrumental in discerning patterns and relationships within datasets in data analytics and unsupervised machine learning. Traditional clustering algorithms struggle to handle real-world data analysis problems where the number of clusters is not readily identifiable. Moreover, they face challenges in determining the optimal number of clusters for high-dimensional datasets. Consequently, there is a demand for enhanced, adaptable and efficient techniques. Convex clustering, rooted in a rich mathematical framework, has steadily emerged as a pivotal alternative to traditional techniques. It amalgamates the strengths of conventional approaches while ensuring robustness and guaranteeing globally optimal solutions. This review offers an in-depth exploration of convex clustering, detailing its formulation, challenges and practical applications. It examines synthetic datasets, which serve as foundational platforms for academic exploration, emphasizing their interactions with the semi-smooth Newton augmented Lagrangian (SSNAL) algorithm. Convex clustering provides a robust theoretical foundation, but challenges, including computational limitations with expansive datasets and noise management in high-dimensional contexts, persist. Hence, the paper discusses current challenges and prospective future directions in the domain. This research aims to illuminate the potency and potential of convex clustering in modern data analytics, highlighting its robustness, flexibility and adaptability across diverse datasets and applications.
Keywords