Brazilian Archives of Biology and Technology (Jun 2024)
Advancing Gene Expression Data Analysis: an Innovative Multi-objective Optimization Algorithm for Simultaneous Feature Selection and Clustering
Abstract
Abstract Clustering algorithms play a crucial role in identifying co-expressed genes in microarray data, while feature subset identification is equally important when dealing with large data matrices. In this research paper, we address the problem of simultaneous feature selection and gene expression data clustering within a multi-objective optimization framework. Our approach employs the Archived multi-objective simulated annealing (AMOSA) algorithm to optimize a multi-objective function that incorporates two internal validity indices and a feature weight index. To determine data point membership in different clusters, we utilize a point symmetry-based distance metric. We demonstrate the effectiveness of our proposed approach on three publicly available gene expression datasets using the Silhouette index. Furthermore, we compare the clustering results of our approach, unsupervised feature selection and clustering using Multi-objective optimization framework (UFSC-MOO), to nine other existing techniques, showing its superior performance. Statistical significance is confirmed through Wilcoxon Rank Sum test. Also, biological significance test is employed to show that the obtained clustering solutions are biologically enriched.
Keywords