Informatics in Medicine Unlocked (Jan 2021)
Gene selection for cancer detection using graph signal processing
Abstract
Background:: Gene databases are usually large in volume and contain information on thousands of genes. The data are numerical and represent the expression levels of the genes. In patients with cancer, only a few genes are significantly different from those in healthy patients. Therefore, it is important to select a small number of significant genes to facilitate cancer detection. Method:: We propose a novel gene selection algorithm that leverages techniques from graph signal processing. Patient graphs were first constructed using genetic data. The nodes of the graph correspond to patients, and the links represent the genetic similarity between the patients. The expression levels were modeled as signals on the graph. A variation measure can be defined for the graph signals using the Laplacian matrix of the graph. The variation measure is a good indicator of the significance of a particular gene for cancer detection. Algorithms were then developed to select the significant genes. The selected genes were used as features in classifiers, such as the naive Bayes and support vector machine (SVM), for cancer detection. Results:: Classification experiments were performed using three commonly used gene expression datasets for cancer research: (i) prostate tumor, (ii) gastric cancer, and (iii) brain tumor. Comparisons with other feature selection algorithms demonstrated that the proposed algorithms are generally superior in detecting cancers.