ICTACT Journal on Soft Computing (Jan 2022)
GENE BICLUSTERING ON LARGE DATASETS USING FUZZY C-MEANS CLUSTERING
Abstract
The current study employs biclustering to alleviate some of the drawbacks associated with gene expression data grouping. Different biclustering algorithms are used in this study to detect unique gene activity in various contexts and reduce the duplication of broad gene information. Furthermore, machine learning or heuristic algorithms have become widely utilised for biclustering due to their suitability in problems where populations of potential solutions allow examination of a larger percentage of the research area. To begin with, gene expression data biclusters frequently contain data that is the same under a variety of different situations of gene expression. Therefore, the biclustering technique is particularly effective if the matrix lines and columns are merged immediately. Submatrices can be identified using the Large Average Sub matrix. A Fuzzy C-Means algorithm is also used to ensure that the sub-matrix can be expanded to include more rows and columns for further analysis. The sub-matrices and component precision and strength are factored into the system design. It uses biclustering techniques to differentiate gene expression information. On the Garber dataset, the simulation is run in Java. Using the average match score for non-overlapping modules, the influence of noise on overlapping modules using constant bicluster and additive bicluster, and the overall run duration, the study is assessed.
Keywords