Journal of Integrative Bioinformatics (Mar 2007)
A Two-Step Clustering for 3-D Gene Expression Data Reveals the Main Features of the Arabidopsis Stress Response
Abstract
We developed an integrative approach for discovering gene modules, i.e. genes that are tightly correlated under several experimental conditions and applied it to a threedimensional Arabidopsis thaliana microarray dataset. The dataset consists of approximately 23000 genes responding to 9 abiotic stress conditions at 6-9 different points in time. Our approach aims at finding relatively small and dense modules lending themselves to a specific biological interpretation. In order to detect gene modules within this dataset, we employ a two-step clustering process. In the first step, a k-means clustering on one condition is performed, which is subsequently used in the second step as a seed for the clustering of the remaining conditions. To validate the significance of the obtained modules, we performed a permutation analysis and determined a null hypothesis to compare the module scores against, providing a p-value for each module. Significant modules were mapped to the Gene Ontology (GO) in order to determine the participating biological processes.