مجله دانشکده پزشکی اصفهان (Jan 2016)
Dimensionality Reduction on Topological Features of the Gene Network Constructed from Microarray Data for Prediction of Breast Cancer Recurrence
Abstract
Background: Extracted features from gene expression profiles of DNA microarrays are traditional tools in cancer classification. In this regard, using topological properties of genes through the gene network reconstruction can provide more reliable findings. The main goal of this article is the prediction of breast cancer recurrence via using topological features of the relevance network reconstructed from gene expression profiles. Methods: We utilized seven gene expression microarray datasets, including 1271 samples from seven studies on breast cancer. In this study, the relevance gene network was reconstructed and FDA (Fisher Discriminant Analysis) method was applied for gene selection based on the characteristics of the network topology. To construct the gene network, we needed a profile of expressions for each gene and it could not be obtained from a single sample. Therefore, to classify a test sample, this sample was added to the training data and new gene networks were reconstructed according to two groups of high- and low-risk samples. The correlation coefficient between topological quantity vectors of the networks before and after adding test sample was calculated. The test sample was classified to the group that corresponded to higher correlation between new reconstructed network and the primary labeled network. Findings: The classification accuracy was calculated using 5-fold cross-validation based on both correlation threshold and k-nearest neighbor (kNN) classifier and non-linear support vector machines (SVM) classifier that were applied on the topological properties of reconstructed gene networks. The results confirmed the advantage of applying topological features to the kNN and the non-linear SVM classifiers. The highest accuracy in prediction with the kNN classifier was obtained via degree centrality property that reached 98.5% in average among various numbers of genes. Conclusion: Topological features of reconstructed gene networks from gene expression profiles provided more stable and accurate results in prediction of breast cancer recurrence.