IEEE Access (Jan 2019)

Dimensionality Reduction in Gene Expression Data Sets

  • Jovani Taveira De Souza,
  • Antonio Carlos De Francisco,
  • Dayana Carla De Macedo

DOI
https://doi.org/10.1109/ACCESS.2019.2915519
Journal volume & issue
Vol. 7
pp. 61136 – 61144

Abstract

Read online

Dimensionality reduction is used in microarray data analysis to enhance prediction quality, reduce computing time, and construct more robust models. In addition, the algorithm learning performance involves an expressive number of attributes (genes) relative to the classes (samples). Therefore, in this paper, we conducted a detailed comparison of two reduction methods, attribute selection and principal component analysis, to analyze gene expression data sets. Both reduction methods were employed in the pre-processing stage and then evaluated experimentally. Furthermore, we introduced a combination of consistency-based subset evaluation (CSE) and minimum redundancy maximum relevance (mRMR), which we referred to as CSE-mRMR, to improve classification efficiency. The results indicated a significant increase in classifier hit rates with both methods, compared to using all attributes. By employing cross-validation, attribute selection outperformed PCA consistently across classifiers and datasets, and CSE-mRMR demonstrated good classification performance in the data sets. Taken together, the literature and current results suggest that the attribute selection may be relevant in the analysis and future prediction of gene expression data sets.

Keywords