IEEE Access (Jan 2019)
Informative Feature Clustering and Selection for Gene Expression Data
Abstract
Feature selection aims to remove irrelevant and redundant features from input data. For gene expression, selecting important genes from gene expression data is essential since the gene expression data often consists of a large number of genes. However, the commonly-used feature selection methods are usually biased toward the highest rank features, and the correlation of these selected features may be high. To overcome these problems, we propose an informative feature clustering and selection method to select informative and diverse genes from the gene expression data. The method consists of two steps. In the first step, a feature clustering (FC) method is designed to cluster total genes into several gene clusters. In FC, a set of feature weights are computed to respect the importance of each gene, and we sort the genes in different gene clusters based on the feature weights. In the second step, we propose a stratified feature selection (SFS) method to select genes from different gene clusters and combine them to form the final feature set. Experiments on several gene expression data demonstrate the superiority of the proposed method over six popular feature selection methods.
Keywords